新型肺炎COVID-19の重症化率を推定する - 驚異のアニヲタ社会復帰の予備

読んだ。
Eurosurveillance | Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020
ダイヤモンド・プリンセス号のデータを使って、感染した場合の重症化率（というか死亡率）を推定する。結論から言うと全年齢層で1.3%、70歳以上だと6.4%になる。
しかしながら、各日に報告された感染者数と死亡者数から単純に計算した重症化率 naive case fatality ratio (nCFR) は、すべての感染者の転機がわからず、死亡が観測されるまでの時差によるので、この時差 delay を考慮した推定を行う。
この推定は、時刻 $t$ の（報告された）感染者数 $c_t$ 、感染もしくは入院から死亡まで $t$ の時差がある割合 $f_t$ （訳に自信なし）（これは対数正規分布になっている）として、実際に得ている観測数に対しての過小評価分 $u_t$ は
$u_t=\frac{\displaystyle\sum_{i=0}^t\displaystyle\sum_{j=0}^\infty c_{i-j}f_t}{\displaystyle\sum_{i=0}^t c_j}$
を計算することで出来る、とあるが、githubにあるスクリプトを見ても
$u_t=\frac{\displaystyle\sum_{i=0}^t\displaystyle\sum_{j=0}^{\require{color}\textcolor{red}{i}} c_{i-j}f_t}{\displaystyle\sum_{i=0}^t c_{\require{color}\textcolor{red}{i}}}$
な気がするのだが、よくわからない。
GitHub - thimotei/cCFRDiamondPrincess

ダイヤモンド・プリンセス号の死亡者は70歳以上のカテゴリにしかいないので、70歳以上のみにわけたときの重症化率も推定している。

やってみた結果、

Age group	cIFR (95% CI)	cCFR (95% CI)
All ages	1.299 (0.525, 2.657)	2.601 (1.049, 5.323)
70歳以上	6.445 (2.621, 12.785)	12.910 (5.259, 25.611)

となり、70歳以上と、cCFR (corrected CFR) はよいが、全年齢のcIFR (corrected infection fatality ratio) は信頼区間がずれた。

ちなみに、対数正規分布のパラメータは平均 $m$ と中央値 $M$ が与えられていて、これを対数正規分布用のパラメータに変換するには $\mu=\log(M)$ 、 $\sigma=\sqrt{2(\log(M)-\mu)}$ としているが、なんか違うっぽい。
RPubs - 対数正規分布から、平均値(中央値)と標準偏差を指定して乱数を生成させる
図の見た目はあっている。
f:id:MikuHatsune:20200331231832p:plain

正規分布のパラメータ
zMean <- 13
zMedian <- 9.1
propSymptomatic <- 619/309
propOver70Cases <- 124/619
propOver70Deaths <- 7/7
mu <- log(zMedian)
sigma <- sqrt(2*(log(zMean) - mu))

x <- seq(0, 40, length=300)
plot(x, dlnorm(x, mu, sigma), type="l", lwd=5,
     xlab="Days after hospitalisation", ylab="P(death on a given day|death)")


# 一部改変した
scale_cfr <- function(data_1_in, mu, sigma){
  case_incidence <- data_1_in$new_cases
  death_incidence <- data_1_in$new_deaths
  cumulative_known_t <- rep(0, nrow(data_1_in)) # cumulative cases with known outcome at time tt
  # calculating CDF between each of the days to determine the probability of death on each day

  for(ii in 1:nrow(data_1_in)){
    known_i <- 0 # number of cases with known outcome at time ii
    for(jj in 0:(ii - 1)){
      known_jj <- case_incidence[ii - jj]*(plnorm(jj, mu, sigma) - plnorm(jj - 1, mu, sigma))
      known_i <- known_i + known_jj
    }
    cumulative_known_t[ii] <- known_i # Tally cumulative known
  }
  # naive CFR value
  b_tt <- cumsum(death_incidence)/cumsum(case_incidence) 
  # corrected CFR estimator
  p_tt <- cumsum(death_incidence)/cumsum(cumulative_known_t)
  data.frame(nCFR = b_tt, cIFR = p_tt, total_deaths = cumsum(death_incidence), 
             cum_known_t = cumulative_known_t, total_cases = cumsum(case_incidence))
}

newData <- read.table(text="
         date new_cases new_deaths
1  2020-02-05        10          0
2  2020-02-06        10          0
3  2020-02-07        41          0
4  2020-02-08         3          0
5  2020-02-09         6          0
6  2020-02-10        65          0
7  2020-02-11         0          0
8  2020-02-12        39          0
9  2020-02-13        44          0
10 2020-02-14         0          0
11 2020-02-15        67          0
12 2020-02-16        70          0
13 2020-02-17        99          0
14 2020-02-18        88          0
15 2020-02-19        79          0
16 2020-02-20        13          2
17 2020-02-21         0          0
18 2020-02-22         0          0
19 2020-02-23         0          1
20 2020-02-24         0          0
21 2020-02-25         0          1
22 2020-02-26        71          0
23 2020-02-27         0          0
24 2020-02-28         0          2
25 2020-02-29         0          0
26 2020-03-01         0          1
27 2020-03-02         0          0
28 2020-03-03         0          0
29 2020-03-04         0          0
30 2020-03-05        -9          0
")

ageCorrectedDatNew <- cbind.data.frame(date=newData$date,
                                       new_cases=round(newData$new_cases * propOver70Cases),
                                       new_deaths=round(newData$new_deaths * propOver70Deaths))

res0 <- scale_cfr(newData, mu, sigma)
res1 <- scale_cfr(ageCorrectedDatNew, mu, sigma)
res <- list("all_age"=res0, "over_70"=res1)

ci <- mapply(function(z) binom.test(tail(z$total_deaths, 1), round(sum(z$cum_known_t)))$conf.int[1:2], res)
ci <- rbind(mapply(function(z) tail(z$cIFR, 1), res), ci)

# 無症候性陽性患者を考慮した CRF
ci * propSymptomatic