littlebabyjesus
one of Maxwell's demons
I don't know. It was a genuine question. Given the prevalence of coronaviruses in the last few years, there must be a few, particularly in East Asia.Good question. What's the answer?
I don't know. It was a genuine question. Given the prevalence of coronaviruses in the last few years, there must be a few, particularly in East Asia.Good question. What's the answer?
Yeah, every country has its hotspots. Some of the countries are more hotspot-sized, so their figures can be alarming. New York state now has more than 500 deaths per million. If it were a country, it would top the charts atm.Yes, I've not managed to find anywhere that will compare things on a graph at that level. The site I took those graphs from will break down by US state but not into regions of European countries.
I would be very interested to see things graphed against population density, in city-sized units, across a number of different countries.
I don't know. It was a genuine question. Given the prevalence of coronaviruses in the last few years, there must be a few, particularly in East Asia.
The only way to make anything close to meaningful comparisons between geographically separate areas is to factor in population density and at a sub-national granularity. One has to figure out precisely what question it is you are trying to answer first in trying to process the data in this way. I suspect that countries are too disparate (structurally, culturally, reporting mechanisms, healthcare setups, etc, even at various internal levels) to easily compare on a like for like basis.Yep, Belgium's in a mess. If you're breaking things down to more like Belgium-sized chunks, it also makes more sense to view the states of New York and New Jersey as one unit. London on its own, or perhaps London plus its commuter belt (which would be about Belgium-sized), would also not look too pretty.
I would be very interested to see things graphed against population density, in city-sized units, across a number of different countries.
Need to be a little cautious, according to the Guardian (my emphasis),Yep, Belgium's in a mess. If you're breaking things down to more like Belgium-sized chunks, it also makes more sense to view the states of New York and New Jersey as one unit. London on its own, or perhaps London plus its commuter belt (which would be about Belgium-sized), would also not look too pretty.
Here you go
It doesn't really, at least not on that data, R^2=0.15 is not a strong correlation.Interesting, I was surprised population density had such a weak relationship. So, looks down to government policy, particularly how soon lockdown came - less of a surprise.
It doesn't really, at least not on that data, R^2=0.15 is not a strong correlation.
Here you go
I don't think he did "So, looks down to government policy, particularly how soon lockdown came - less of a surprise". Surely that's saying lockdown has a relationship.that’s what two sheds said
Yeah, this second bit is important re comparisons. Good info from redsquirrel about the big discrepancies in reporting deaths - glad to hear that that is at least part of why Belgium looks so bad (although of course it also points at others actually being a lot worse than their figures might suggest). But another reason such figures can come up is due to the size of units being compared differing. US state of New York would have the highest deaths per capita in the world for a unit that kind of size if it were a country (San Marino may be ahead, but that's another example of the problem - San Marino is minute), and NY has a larger population than Belgium, but instead, it's lumped in with a Europe-sized unit when doing country comparisons. So there's a bit of a randomising factor here, and the smaller the unit being considered, the greater the chance of big anomalies either way - either from a region missing big hotspots or a region containing them - without necessarily reflecting much on that region's policies. And of course borders are a bit arbitrary - the New York and New Jersey outbreaks are essentially part of the same process given that Jersey City is effectively a suburb of NY City.Thanks.
He's not really graphed against density though - number of large cities is not the same thing, and you'd need it broken into smaller geographical units to tell you anything useful.
These things always mostly come down to what you’re specifically measuring and how you measure the way it changes. Picking the metric is 90% of the battle. I’m not convinced that a random journalist graphing the first data he comes across is really the best guide.
Yeah, I mean I think lockdown date probably does (or at least can) have a significant effect, it's just that from the data being used there is only a weak relationship.These things always mostly come down to what you’re specifically measuring and how you measure the way it changes. Picking the metric is 90% of the battle. I’m not convinced that a random journalist graphing the first data he comes across is really the best guide.
Also, there's a thing going round correlating woman leaders with more successful management of the epidemic. Let's graph that too.
The data he has chosen to graph has so many problems that if he’s claiming data expertise, that’s worrying.He’s not just a random journalist. Worth following the discussions on his graphs if data is something your interested in.
that’s what two sheds said
About the escaped bioweapon/germ warfare theory: if C19 had escaped from a Chinese research lab then it would seem likely that the authorities would have known and acted immediately. They may have even succeeded in suppressing both the epidemic and any knowledge of it, at least in the short term.
Yeah good point about assumptions. However, you can still get somewhere with that by making predictions based on those assumptions, no?The data he has chosen to graph has so many problems that if he’s claiming data expertise, that’s worrying.
Even if you cleaned up the data, though, the more fundamental problem is that it would only act to confirm the biases inherent in the way you cleaned it. We have a qualitative understanding of the way this virus spreads that is way richer than some blunt quantitative measures with problems in it. So we use our qualitative understanding to clean the quantitative measure and lo! We end up confirming what we already qualitatively understood. It’s begging the question.
If you wanted to do this properly, you’d need to standardise the data, because littlebabyjesus is right — smaller data sets are inherently more variable, so the smallest sets will always be to and bottom of a list that derived from the same underlying distribution. So that means understanding the intrinsic correlation — how well a dataset correlate with itself as it grows. And you’d need to allow for skewness, particularly because dependency is unlikely to follow a Gaussian copula (ie ranknormal correlation). And then you need to recognise the “hot spot” issue that teuchter is talking about.
In short, cleaning this data is such a big job that all it does is feed to you back the assumptions you made in cleaning it.
And yes, I’m interested in data. Professionally, not because it’s fun.
I agree. You’re taking advantage of the fact that although the data has big problems, some of those problems (eg reporting lags) are fairly common across data sets. Inaccuracies are not a big when they are consistent across time and across datasets.Yeah good point about assumptions. However, you can still get somewhere with that by making predictions based on those assumptions, no?
For instance, in trying to work out where this is going (doing it for its own sake, not professionally), I've made a few basic and broad assumptions. First that lockdown works in reducing spread and differences in lockdown effectiveness due to differences in the details of implementation are relatively unimportant, so the main determining factor in a country's trajectory from lockdown onwards is the state it was in at lockdown (a state that we can only work out later). And second, adding on to that, that countries that test more will do better as they will have a better idea of where the problems are.
You can then make predictions about where things will go in a place by a) adding in further assumptions based on the various estimates of time-lag regarding incubation-time and getting very sick and dying time, and b) comparing its data on all the aspects relevant to these assumptions to how things have gone in other places.
It seems to be broadly working, despite all the problems in the data.
The graphs posted on Twitter aren’t in any sense “modelling”. He’s just bunged some uncleansed data through Excel to get a correlation factor.Don’t disagree with any of that kabbes but this is an evolving situation and the modelling should be expected to develop as time goes by. This is what the experts are currently working on, to improve the modelling and therefore the understanding of the pandemic.
As acknowledged by the author there are problems with the data but it’s the best data we have at the moment. Understanding the problems with it hopefully allows better data sources to be developed.