[Esip-machinelearning] Kaggle open dataset challenge for COVID-19

Yuhan Douglas Rao yuhan.rao at gmail.com
Mon Mar 23 17:11:35 EDT 2020


Dear all ESIP Machine Learning cluster member,

Hope you are doing well during the social distancing era to fight the
pandemic. Bill Teng has shared this open research dataset challenge on
Kaggle
<https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge>(see
details below) with Anne and me. This looks like a very interesting
challenge and maybe some of us want to apply our AI skills to this
challenge to "Make Data Matter"!

Thanks Bill for sharing! Stay well!
Kind regards,
[image: NCICS] <http://ncics.org/> Yuhan (Douglas) Rao
*Postdoctoral Research Scholar*
North Carolina State University <http://ncsu.edu/>
North Carolina Institute for Climate Studies (NCICS) <https://ncics.org/>
151 Patton Ave, Asheville, NC 28801
e: yrao5 at ncsu.edu
o: +1 828 271 4903

Below is the details of the challenge (
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge).

Dataset Description

In response to the COVID-19 pandemic, the White House and a coalition of
leading research groups have prepared the COVID-19 Open Research Dataset
(CORD-19). CORD-19 is a resource of over 44,000 scholarly articles,
including over 29,000 with full text, about COVID-19, SARS-CoV-2, and
related coronaviruses. This freely available dataset is provided to the
global research community to apply recent advances in natural language
processing and other AI techniques to generate new insights in support of
the ongoing fight against this infectious disease. There is a growing
urgency for these approaches because of the rapid acceleration in new
coronavirus literature, making it difficult for the medical research
community to keep up.
Call to Action

We are issuing a call to action to the world's artificial intelligence
experts to develop text and data mining tools that can help the medical
community develop answers to high priority scientific questions. The
CORD-19 dataset represents the most extensive machine-readable coronavirus
literature collection available for data mining to date. This allows the
worldwide AI research community the opportunity to apply text and data
mining approaches to find answers to questions within, and connect insights
across, this content in support of the ongoing COVID-19 response efforts
worldwide. There is a growing urgency for these approaches because of the
rapid increase in coronavirus literature, making it difficult for the
medical community to keep up.

A list of our initial key questions can be found under the Tasks
<https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks>
section
of this dataset. These key scientific questions are drawn from the NASEM’s
SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing
Committee on Emerging Infectious Diseases and 21st Century Health
Threats) research
topics
<https://www.nationalacademies.org/event/03-11-2020/standing-committee-on-emerging-infectious-diseases-and-21st-century-health-threats-virtual-meeting-1>
and
the World Health Organization’s R&D Blueprint
<https://www.who.int/blueprint/priority-diseases/key-action/Global_Research_Forum_FINAL_VERSION_for_web_14_feb_2020.pdf?ua=1>
for
COVID-19.

Many of these questions are suitable for text mining, and we encourage
researchers to develop text mining tools to provide insights on these
questions.
Prizes

Kaggle is sponsoring a *$1,000 per task* award to the winner whose
submission is identified as best meeting the evaluation criteria. The
winner may elect to receive this award as a charitable donation to COVID-19
relief/research efforts or as a monetary payment. More details on the
prizes and timeline can be found on the discussion post
<https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/discussion/135826>
.
Accessing the Dataset

We have made this dataset available on Kaggle, and are periodically
updating it from its source. To learn more and access the latest copy of
the dataset, you can also go here:
https://pages.semanticscholar.org/coronavirus-research.

The licenses for each dataset can be found in the all _ sources _ metadata
csv file.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.esipfed.org/pipermail/esip-machinelearning/attachments/20200323/a19f80c6/attachment.htm>


More information about the Esip-MachineLearning mailing list