In order to transform data into knowledge we need to structure it properly and understand its meaning: what qualities does it display or what trends does it indicate. Descriptive statistical methods are used for such analysis.
This branch of statistics does not make forecasts or develop assumptions about broader groups of data. It only focuses on a limited set of data which has been measured directly. This is an important step for any researcher: it is useful to obtain complete knowledge about available data before venturing deeper into the unknown and trying to analyze other areas.
Main Features of the Descriptive Statistics
Statistical research typically starts with picking a specific data set and measuring all of its important parameters. Then a detailed summary must be provided, using several standard metrics. They are divided into two groups:
- central tendency metrics: mean, median, and mode
- variability or dispersion metrics: variance, minimum & maximum, kurtosis, and skewness.
A parameter that varies within the given population (i.e. the set of specimens that have been measured) can be evaluated by calculating its “expected value”. This is done by summing all its possible values (which have been found when measuring the specimens), with each of them multiplied by the probability of finding such a value in this specific set. E.g., if 2 people from a group of 10 are 2 meters high, then the value of 2 gets the probability of 2/10 = 20%. The mean value might now be equal to any of possible values found during examination – its formula returns the evaluation of the expected value in a range, it does not pick one value from this range. A simple example of mean is the average value which is calculated by dividing the sum of possible parameter values by the size of this range.
During statistical research it is sometimes useful to define a parameter value which separates the range of values we have found in our sample into 2 halves. It is called the median and essentially is the “middle” value in this range: half of other values are bigger and the other half are smaller than it. Unlike the mean value, median is one of the “actual” values and can be more informative when analyzing such things as household incomes or footwear sizes: you would certainly want to obtain a real foot size in order to deliver a fitting footwear.
A value that is most often found within the given range of parameter values is called the mode of this parameter. Being the most typical for a selected group of specimens, it sometimes can say about them a lot.
Variance or Standard Deviation
This value shows how greatly the values vary. The lower it is, the closer most of the values in the range are to the mean of this range. High variance means bigger differences. Particularly, it might indicate that there are very big and very small values.
It often helps to understand the properties of a specimen group by finding the biggest and the smallest ones. Sometimes knowing the borders of a certain parameter helps to avoid unnecessary analysis. For example, if we have a group of households to analyze and the maximum income in this group is quite low for this region, we can be sure no family from this group would be interested in buying luxury cars.
A specific value indicating how “curved” is the probability graph for a value range. It is used for more complex analysis.
This value measures the “equality” of values from a given range regarding the mean value: how many of them are less than mean and how many are bigger.
Depending on the number of parameters to analyze, there are different types of descriptive statistical research:
- Univariate: if there is only 1 parameter to analyze. E.g. when analyzing households, we only check their income and nothing else.
- Bivariate/Multivariate: if there are 2 or more parameters. In this case we not only measure the parameters but also explore how they influence each other.
Where Can the Descriptive Statistics Be Used
It is usually necessary to extract all important data from a small set of specimens before proceeding with conclusions about other groups. It is especially useful in the following situations:
- Education: having the complete information about its pupils’ performance, a school can understand the efficiency of its programme
- Healthcare: patients’ data are analyzed on a local scale before assuming trends on a regional or global scale
- Sales: each retail or e-commerce company uses some of these methods to understand its performance.
Summing it up, the following steps are usually taken by an analyst who has obtained a limited set of specimens and has measured their parameters:
- Collect available specimens and define their key parameters
- Perform all necessary measurements
- Calculate central tendency and variability metrics for each parameter
- Make conclusions about the available group
- Proceed with assumptions about other related groups if necessary.
Researchers utilize these methods in order to arrange the large sets of data which are difficult to understand directly into clear metrics depicting important qualities of any such group. Descriptive statistics helps to structure available information and to develop insights which can be used to undertake necessary actions. Any modern organization dealing with lots of data has to utilize descriptive statistical methods to understand its state of affairs and plan its activities accordingly.