"Statistics: The Mathematical Theory of ignorance." Morris Kline
Many people to date still find problems with statistics and facts portrayed in the form of statistics. While George Barnard Shaw had this to say about statistics, "It is the mark of a truly intelligent person to be moved by statistics," other important people like Mark Twain also accused statistics of, "Being more pliable" than facts.
So why are people so opposed to deriving answers from statistics and why have people always been opposed to the use of statistical methods and statistical modelling tools to help solve problems?
Well, statistics has always been and remains to be the favourite piece of evidence for writers, businessmen and speakers. The reason statistics is so highly valued by these writers, businessmen and speakers is because it provides numerical evidence to prove an argument. Statistics is all about connecting the dots and trying to find the facts. The reason why people still react with scepticism towards statistics despite so many positives arising from the field of statistics is because people do not know exactly who is providing the facts. People are also unsure what the person had to do in order to come up with statistics.
So in order to drift away from the idea that statistics are the same as lies, let's see who is actually coming up with these statistics and how exactly these people are coming up with the statistics.
Statisticians come up with statistics using statistical modelling techniques. These statisticians come up with statistics or data in order to solve real-world problems. Hence the role of statisticians are quite important, right?
Statistical Modelling: Techniques Used for Statistics
We are all by now quite familiar with statistical modelling techniques but one thing we are unsure of is how exactly these statistical modelling techniques actually affect the world that we live in. Well, statisticians have been around for quite a long period of time with statisticians always working on generating answers to problems that arise.
Answers in the form of quantifiable data. Data in the form of numbers which actually make sense when seen in the bigger scheme of things. Statisticians have always been using data to solve problems and statisticians still do so today, however, now statisticians make sure to use machines and computer-based technology to ensure that data is accurately generated. So how do statisticians generate data by using computer programmes?
Well, statisticians use coding software such as Python in which they come up with algorithms or codes which tell the software what needs to be done and when in terms of the data that they have collected. Making sense of the data that has to be collected is the main part of what statisticians actually do. (The main part of what they used to do in the past and what they still a big part of what they do today). Statisticians can make sense of the data in several ways. One such way is by using a simple linear regression model.
Simple Linear Regression Models as a Method of Compiling Data
When collecting data, one of the things that statisticians always have to come up with is a dependent variable as well as an independent variable. The simple linear regression model allows statisticians to be able to study the relationship between continuous variables. There is much too much jargon that may fly over the head of the layman when we look at the simple linear regression model carefully. When using this particular modelling technique, statisticians are expected to distinguish between a deterministic and statistical relationship.
Statisticians also need to understand the concept of least square criterion. Statisticians must even be aware of how to derive estimates of certain variance. While reading these concepts may seem overly confusing for you, you need to know that the simple linear regression model is but one model that students in the field of statistics learn about. There are several other statistical models that students of the statistics field may stumble across during their years of studies.
Basic Data Skills
Put yourself in the shoes of a data analyst for a second. In order to be termed a professional data scientists in the most basic form, there are some data skills that you ought to have that is known as the basic data skills. Basic data skills involve showcasing your ability to analyze data and make informed decisions about the data that you have analysed. There are several skills that all data analysts need to have. These skills include:
- Structured Query Language (SQL)
- Microsoft excel
- Critical thinking skills and problem-solving skills
- R or Python programming language
- Data visualization
- Presentation skills
- Machine skills or computer programming skills
These are just some of the basic data related skills that you will need to have mastered if you are going to sit at a data scientist's desk for the day.
Data Science Skills
You must remember that when you qualify as a data scientist, all the jobs that you may get at first will be entry-level data analyst jobs. While you may wish to climb the corporate ladder of success, you have to start at the bottom of the ladder. Once you gain some experience in various data analyst's jobs, you can advance as a data scientist.
Let's imagine a day in the life of a data scientist now. In order to be a data scientist, you must be well equipped with theoretical data science knowledge and data skills to be able to collect data and sift through the large amount of data that you have collected.
At the core of data analysts and data scientist skills are the skills that they need to constantly clean out and prepare data. A data scientist must also be aware of data that may be missing and inconsistent with regards to all the findings. Sometimes, a data scientist may need to start by retrieving data and viewing it well. It is believed that the most fun part of any data scientist's job is the part where they actually retrieve the data and then sort through the data. It is oftentimes believed that the problem-solving exercises that are done by the data scientists are the most enjoyed by these data scientists and these problem-solving exercises can be said to be all the more reason why data scientists simply enjoy what they do.
SQL skills may be used by a data scientist during the day in order to derive data from the database of the company. Once the data has been retrieved from the company's database, a data scientist may engage in using computer programmes such as Python in order to analyze the data. Once you have effectively analyzed the data set, you need to formulate a question based on the data. When you have a question in mind, you must also use the same data to find the answer to the question that you have asked. If you are a data scientist, none of the tasks that are meant to be carried out by a data scientist seems overly tedious to say the least. While you are looking for the answer to the question that you posed for yourself, you must simultaneously work on finding trends and patterns within the data set that you retrieved. So how did you manage to become this confident in data science and the field of data analysis? Well, it all started with the study of statistics.
Statistical Knowledge as a Field Intertwined with Data Science
It is recommended that prior to completing your master's degree in the Data Science field, you first complete your undergraduate degree in either mathematical statistic or applied statistics. When you study statistics at an undergraduate level, you learn the basics of statistics and probability theory. In order to make heads or tails of what the data means, you need to have this basic statistical knowledge. Having baseline knowledge of statistics helps you to avoid making common logical errors and fallacies with regard to the data in front of you.
Again the exact theoretical knowledge of what you need to recall from your years of studying statistics will then again be dependent upon the demands in terms of the statistics related knowledge needed from your respective company.
Statistical knowledge enables you to then visualize the data in front of you in order to get a better understanding of the data and make sense of it. Plots and charts are ways used by data analysts and scientists to communicate data. Plots and charts allow data scientists to visualize the data themselves. It is often true that things that are hidden beneath a data set become far more apparent when data is present visually in the form of a graph.
So having heard all about the job of a data scientist and what exactly a data scientist does, are you all the more enthusiastic to advance in the field of data science? Is this a job that you feel will suit your personality best? If you have answered yes, congratulations you are headed towards occupying your spot in one of the most lucrative career fields to date. Soon, data analysis and data inspection will become second nature. To fill you up with some more jargon related to the data science field, perhaps read the article Are you constantly using regression analysis theory without knowing it?