What is Data Science?
Data science is the study of “data”. It basically involves different methods which can be developed around recording, storing, &analyzing data across different domains so as to effectively extract meaningful information. The goal of data science is to create perception and knowledge from data — which can be both structured and unstructured.
Data science is related to computer science, but is altogether a separate field. Computer science involves creating programs and algorithms to record and process data, while data science covers any type of data analysis, using scientific methods, processes, algorithms and systems. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. It is more closely related to the “mathematics field of Statistics”, which includes the collection, organization, analysis, and presentation of data. It is around the concept to “unify statistics, data analysis, machine learning and their related methods”.
Because of the large amounts of data modern companies and organizations maintain, data science has become an integral part of IT. Data science should not be confused with data analytics. Both fields are ways of understanding big data, and both often involve analyzing massive databases using R and Python.
Various companies have petabytes of user data that may use data science to develop effective ways to store, manage, and analyze the data. Different scientific methods can be used to run tests and extract results that can provide meaningful insights about their users.
Below is a practical example of Data Science
“GOOGLE: MACHINE-LEARNING FOR METASTASIS
Location: Mountain View, California
How it’s using data science: Google hasn’t abandoned applying data science to health care. In fact, the company has developed a new tool, LYNA, for identifying breast cancer tumors that metastasize to nearby lymph nodes. That can be difficult for the human eye to see, especially when the new cancer growth is small. In one trial, LYNA — short for Lymph Node Assistant —accurately identified metastatic cancer 99 percent of the time using its machine-learning algorithm. More testing is required, however, before doctors can use it in hospitals”.
“How Data Science can help in COVID-19?”
The main cause of the wide spread of the coronavirus is the lack of information about the early-stage symptoms. This has led to a situation where people are not aware whether they are affected or not. They travel from one place to another with no clue that they are carrying the virus with them.
Now, the governments have started collecting the information of citizens such as their travel history and medical records. This has resulted in the collection of “huge data of citizens”. Countries have already started processing this data with the help of “Big Data tools”.
The processing of the data of billions of citizens involves removing “redundancy, scaling the data, and structuring” for further use. This is only possible with the help of various essential tools of “Big Data”.Many of these sources pull from data provided by trusted bodies such as the U.S Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO). They also include direct links to those places so that people have quick, easy access to reliable information.
– After the collection and processing of such huge data, the government authorities analyze and visualize it. The collection can happen following the below basic steps:
Help the people around you how to interpret data/information
Translate information into more languages
Prepare data related to the response
Analyze data that is not directly related to the response
Research using existing disaster response datasets
– By analyzing the data and visualizing the trends in it, Data Science helps the governments make estimates about the scope of further spread of the disease, the available medical infrastructure to admit affected patients, and the budget required for all of this.
– With the help of these estimations, Data Science is helping the governments make arrangements for medical facilities and capital to spend on their citizens.
The “US Centers of Disease Control (CDC)” is working with researchers at the “machine-learning department of Carnegie Mellon University” to forecast the spread of coronavirus.
– The team built a machine-learning model that processes data collected from several sources such as flu-related Google searches, Twitter activity, and web traffic to predict the spread of the virus.
Significant efforts have been made by the scientific community as a whole to offer a unique opportunity to the data science community.
-One such example is the effort to create “The COVID-19 Open Research Dataset (CORD-19)”, an extensive machine-readable collection of coronavirus literature available. CORD-19 is a resource of over 52,000 scholarly articles, including over 41,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.
There are thousands of JSON files, each containing the research paper text details including their references.
Due to the text being unstructured there are data quality issues including (but not limited to) correctly identifying the primary author’s country. This definitely has to be cleaned up.
Upon cleaning up the data, we can apply various NLP algorithms to it to gain some insight and intuition into this data.
– The science community can answer high-priority scientific questions related to COVID-19 with the help of such datasets, data mining and other extraction techniques.
Data Science Can Give Accurate Pictures of Coronavirus Outcomes [Situation Awareness]
Medical professionals and others must get correct and up-to-date information about how the coronavirus situation changes day by day. Several organizations, including Johns Hopkins University, IBM and Tableau, have released interactive databases that offer real-time views of what’s happening with the virus.
Using these databases can inform, the number of confirmed cases, fatalities and recoveries.
Data Science Can Help Track the [“Spread”]
Data science specialists have also concluded that graph databases are instrumental in showing them how COVID-19 spreads. For example, BlueDot was able to predict the early spread of the illness from Wuhan to other Asian cities based on airline ticketing data.
A graph database shows links between people, places or things.
Scientists refer to each of those entities as a node, and the connections between them are the “edges.” The results give a visual representation of the relationship between things, if any.
In the early days of the coronavirus outbreak, Chinese data scientists built a graph database tool called Epidemic Spread.
It allowed people to type in identifying information associated with the journeys they took, such as a flight number or even a car’s license plate.
Mobile phone data can play a key role in tracking the movement of people to help identify where the disease is likely to spread.
The database would then tell those users whether anyone with a confirmed coronavirus case took those same trips and may have spread it to fellow passengers.
Big data analytics can cross reference disease data against high-risk senior residents down to postcode level and the incidence of factors such as diabetes or obesity.
Data Scientists Can Handle [“Contact Tracing”]
Contact tracing is an effective way to slow COVID-19. It involves getting in touch with a person’s close contacts after that individual tests positive for the virus and telling them to self-isolate. Contact tracing is time-consuming, although it’s getting easier as more people take social distancing seriously.
They “created a mobile phone-based solution” to eliminate the need for people to call the contacts manually. Instead, those parties get text messages confirming the need for self-isolation. The researchers clarify that their approach would be most effective if it gets support from national leaders and is not an effort primarily spearheaded by independent app developers.
No nations are using this method yet. Given the market penetration of mobile phones and the familiarity people have with receiving texts, however, it’s easy to see why this approach makes sense.
Data Science [“Managing the pandemic”]
Data science can play a central role in analyzing the large-scale testing of people
AI is also being used to accelerate drug development to treat COVID-19
Google’s Deep Mind AI system is being used to identify characteristics of the virus that may help to understand how it functions.
UK-based BenevolentAI, which is using AI to identify promising existing treatments for other illnesses that could be effective in treating COVID-19.
The more data there is, the more accurate these predictions can be, and the better the pandemic can be managed.
Data Scientists to Find [“Possible Cures”]
Besides the race to restrict the COVID-19 spread, scientists are working as quickly as possible to uncover effective treatments.
Two graduates of the data science program at Columbia University have turned to machine learning to help.
The typical process of antibody discovery in a lab takes years.
This approach, however, takes only a week to screen for therapeutic antibodies with a high likelihood of success.
The team taking this approach says this method is less costly than traditional ones, too. Humans are still part of the process because they have to test the gene sequences identified as most promising by the machine learning algorithm. However, using this expedited method could be crucial in efficiently finding interventions that work for coronavirus patients.
Data Scientists to[“Predict Future Outbreaks”]
The data that is collected from this pandemic will be invaluable in understanding how best to deal with future outbreaks.
Global disease surveillance will be important part in the battle against future pandemics.
The more data we collect, the better data science and AI will be able to help us.