Tuesday, January 28, 2020

從 NCBI 的 BLASTN 對武漢肺炎冠狀病毒基因序列的小小觀察

一開始本來只是想看看冠狀病毒的 DNA 序列和 SARS 的差異.
不過似乎從搜尋工具沒辦法直接做到這點, 所以就從 wuhan 開始吧.

https://www.ncbi.nlm.nih.gov/nuccore/?term=wuhan

選擇用 Viruses 過濾, 從 3. 4. 看到重點了.


3.

4.

這不就是熱騰騰的全基因體序列, 得來全不費功夫.
重點這是海鮮市場的, 就是這場瘟疫的重點之一.
至於起點, 直到今日(01/28)還全無頭緒.

接下來就用 3. 4. 進行 BLASTN https://blast.ncbi.nlm.nih.gov/Blast.cgi
請輸入 MN908947.3 |NC_045512.2 (分兩行), 使用 Highly similar sequences (megablast)
因為病毒基因體只有 29.9kb 左右, 所以迅速找到相似的序列.

Select for downloading or viewing reportsDescriptionMax ScoreTotal ScoreQuery CoverE valuePer. IdentAccession
269433533695%0.089.12%MG772933.1
222233527694%0.088.65%MG772934.1
152132256488%0.082.34%AY395003.1
152132260088%0.082.34%AY394996.1
152022253188%0.082.33%AY304488.1
152022252988%0.082.33%AY304486.1
151912254888%0.082.32%AY390556.1
151862248388%0.082.32%EU371564.1
151862227687%0.082.31%AY394985.1
151862257788%0.082.32%AY278554.2
151802254888%0.082.31%EU371559.1
151802250788%0.082.31%AY559093.1
151802256688%0.082.31%AY394994.1
151802252688%0.082.31%AY394986.1
151762261891%0.082.32%MK211376.1
151762253491%0.082.30%KY417146.1
151752241788%0.082.30%JX163927.1
151752242488%0.082.30%JX163926.1
151752241788%0.082.30%JX163923.1
151752252988%0.082.30%JQ316196.1
151752245088%0.082.30%FJ882963.1
151752257488%0.082.30%DQ898174.1
151752257988%0.082.30%AY864806.1
151752252988%0.082.30%AY714217.1
151752251688%0.082.30%AY559096.1
151752250488%0.082.30%AY559095.1
151752250288%0.082.30%AY559086.1
151752250488%0.082.30%AY559085.1
151752252688%0.082.30%AY559084.1
151752251388%0.082.30%AY559083.1
151752250288%0.082.30%AY559082.1
151752256888%0.082.30%AY274119.3
151752257988%0.082.30%AY323977.2
151752253988%0.082.30%AY291451.1
151752253388%0.082.30%AY502928.1
151752252688%0.082.30%AY502926.1
151752253988%0.082.30%AY502923.1
151752251188%0.082.30%AY394999.1
151752256188%0.082.30%AY394998.1
151752255588%0.082.30%AY394995.1
151752256188%0.082.31%AY394993.1
151752256688%0.082.31%AY394992.1
151752256688%0.082.30%AY394991.1
151752254888%0.082.30%AY394987.1
151752253588%0.082.30%AY394983.1
151752246188%0.082.30%AY394978.1
151752256388%0.082.30%AY357075.1
151752256188%0.082.30%AY282752.2
151752252288%0.082.30%AY427439.1
151752252888%0.082.30%AY283796.1
151752253588%0.082.30%AP006561.1
151732241588%0.082.30%JX163925.1
151712242488%0.082.30%JX163928.1
151692242088%0.082.30%JX163924.1
151692241588%0.082.30%GU553363.1
151692250488%0.082.30%EU371563.1
151692250988%0.082.30%EU371561.1
151692250988%0.082.30%EU371560.1
151692257488%0.082.30%AY864805.1
151692253588%0.082.30%AY278741.1
151692251888%0.082.29%AY559087.1
151692256188%0.082.30%AY394989.1
151692254688%0.082.30%AY485278.1
151672252688%0.082.29%AY502927.1
151652240688%0.082.29%GU553365.1
151632167385%0.082.29%KJ473816.1
151582249888%0.082.29%EU371562.1
151492240388%0.082.29%MK211377.1
151342237286%0.082.26%KY417145.1
151172236888%0.082.26%MK211375.1
150841633264%0.082.35%KF294455.1
150432183486%0.082.18%JX993988.1
149702233987%0.082.20%KJ473814.1
149162216387%0.082.16%DQ648857.1
148922181988%0.082.00%JX993987.1
147592376791%0.081.89%KF294457.1
147312183687%0.082.02%MK211374.1
147222136087%0.081.82%KJ473813.1
146832126186%0.081.79%KJ473812.1
146832106786%0.081.79%KJ473811.1
146832195888%0.081.82%GQ153542.1
146782199688%0.081.82%GQ153543.1
146282141687%0.081.74%KY770860.1
145562138687%0.081.66%DQ648856.1
145562139487%0.081.66%DQ412042.1
145502183088%0.081.68%GQ153547.1
145391582966%0.081.66%KF294456.1
145172187889%0.081.65%FJ211859.1
145122181088%0.081.64%DQ084199.1
145062179188%0.081.63%GQ153540.1
145062179788%0.081.63%GQ153539.1
145012178088%0.081.63%GQ153546.1
145012186389%0.081.63%DQ022305.2
145012184989%0.081.63%DQ084200.1
144952176988%0.081.63%GQ153548.1
144952179788%0.081.62%GQ153541.1
144842175888%0.081.61%GQ153545.1
144842175888%0.081.61%GQ153544.1
135012028079%0.082.94%KU182964.1
134522020679%0.082.88%KY938558.1

使用圖形化顯示


  • Distribution of the top 200 Blast Hits on 100 subject sequences

    Query
    1
    5500
    11000
    16500
    22000
    27500


發現武漢海鮮市場樣本大致上包含第一與第二筆, 但是有一段卻多出來了.
那一段大概在 22000 ~ 23400 約 1400 bps.

此時調出武漢海鮮市場樣本 01/23 的較新版全基因體
https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3

找出 22000 ~ 23400 序列.
aaaaacaacaaaagttggatggaaagtgagttcagagtttattctagtgcgaataattgcacttttgaatatgtctctcagccttttcttatggaccttgaaggaaaacagggtaatttcaaaaatcttagggaatttgtgtttaagaatattgatggttattttaaaatatattctaagcacacgcctattaatttagtgcgtgatctccctcagggtttttcggctttagaaccattggtagatttgccaataggtattaacatcactaggtttcaaactttacttgctttacatagaagttatttgactcctggtgattcttcttcaggttggacagctggtgctgcagcttattatgtgggttatcttcaacctaggacttttctattaaaatataatgaaaatggaaccattacagatgctgtagactgtgcacttgaccctctctcagaaacaaagtgtacgttgaaatccttcactgtagaaaaaggaatctatcaaacttctaactttagagtccaaccaacagaatctattgttagatttcctaatattacaaacttgtgcccttttggtgaagtttttaacgccaccagatttgcatctgtttatgcttggaacaggaagagaatcagcaactgtgttgctgattattctgtcctatataattccgcatcattttccacttttaagtgttatggagtgtctcctactaaattaaatgatctctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaaagattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaattataattacctgtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacaccttgtaatggtgttgaaggttttaattgttactttcctttacaatcatatggtttccaacccactaatggtgttggttaccaaccatacagagtagtagtactttcttttgaacttctacatgcaccagcaactgtttgtggacctaaaaagtctactaatttggttaaaaacaaatgtgtcaatttcaacttcaatggtttaacaggcacaggtgttcttactgagtctaacaaaaagtttctgcctttccaacaatttggcagagacattgctgacactactgatgctgtccgtgatccacagacacttgagattcttgacattacaccatgttcttttggtggtgtcagtgttataacaccaggaacaaatacttctaaccaggttgctgttctttatca

再到 BLASTN 用這段序列進行搜尋, 這次找到不多序列. 但是有趣的結果來了.

Sequences producing significant alignments:
Select for downloading or viewing reportsDescriptionMax ScoreTotal ScoreQuery CoverE valuePer. IdentAccession
23723722%1e-5780.51%KY417148.1
23123120%5e-5681.10%KC880992.1
23123120%5e-5681.10%KC880989.1
23123120%5e-5681.10%KC880984.1
23023025%2e-5578.18%LC469301.1
22622622%2e-5479.94%MK211375.1
22022022%1e-5279.56%MK211378.1
19819823%5e-4677.98%MG772933.1
19319322%2e-4478.06%KY417147.1
19319322%2e-4478.06%KY417143.1
19319322%2e-4478.06%FJ588686.1
19119123%9e-4477.31%KJ473816.1
18718722%1e-4277.74%MG772934.1
18718722%1e-4277.74%KY417149.1
18718720%1e-4278.45%KC881003.1
18718720%1e-4278.45%KC880995.1
18518523%4e-4277.01%KY770859.1
18518523%4e-4277.01%KY770858.1
17617620%2e-3977.78%KC881001.1

  • Distribution of the top 19 Blast Hits on 19 subject sequences

    Query
    1
    250
    500
    750
    1000
    1250
  • 其中 500 ~ 850 左右的片段與右方的序列群不同只找到一筆資料, 卻更增加想像空間.
    這是從日本野生蝙蝠取得的冠狀病毒 Spkie 蛋白 S 基因的序列.

  • https://www.ncbi.nlm.nih.gov/nucleotide/LC469301.1?report=genbank&log$=nucltop&blast_rank=5&RID=2VJ0F2KE014   
其餘也有研究 Spike 蛋白的 S 基因序列但是都侷限在 1100 ~ 1400 左右.
可能原因有:
1. 此區塊與致病相關, 所以有較大數量的定序成品.
2. 左方的區塊可能被認為與致病不相關?
3. 左方區塊出現僅在此次捕捉的野生蝙蝠樣本出現, 就未曾出現.

只是比較多數蝙蝠基因體序列和武漢海鮮市場樣本,
發現武漢海鮮市場樣本多了 22000 ~ 23400 這一段完整的序列.
排除此段為定序難度高區域. 但是武漢的樣本多了這麼一段,
是大自然演化的結果或是人為操控剪接基因造成的???

讓我們繼續看下去.

現在首重在找出病源, 讓檢驗方法簡易且快速. 畢竟潛伏期就能傳染實在
讓發燒篩檢的成果打了折扣. 更重要的是要在最短時間生產出解方及疫苗.
要不然已經發病的患者不能等很久的.

參考資料:
1. NCBI https://www.ncbi.nlm.nih.gov/
               https://blast.ncbi.nlm.nih.gov/Blast.cgi
2. 各種動物與人類冠狀病毒之比較 https://www.coa.gov.tw/ws.php?id=5092