Standard Deviation in Coding: Calculation Methods Explained

Standard deviation is a statistical value that describes the degree of homogeneity (or uniformity) of a dataset [1].

The value of standard deviation can be 0 or any other number:

  • If the standard deviation is 0, it indicates that each element in the dataset is identical to each other.
  • If it is not equal to 0 (it can be negative or positive), it shows that individual data points are smaller or larger than the average.

Standard Deviation Formulas

There are two formulas for calculating standard deviation. The first formula is for the population data, and the second is for the sample data [2].

Here are the formulas:

  • Population Standard Deviation:

    σ = √( Σ(xᵢ – μ)² / N )

    Where:

    • σ = Population standard deviation
    • Σ = Summation
    • xᵢ = Each value in the population
    • μ = Population mean
    • N = Number of values in the population
  • Sample Standard Deviation:

    s = √( Σ(xᵢ – x̄)² / (n-1) )

    Where:

    • s = Sample standard deviation
    • Σ = Summation
    • xᵢ = Each value in the sample
    • x̄ = Sample mean
    • n = Number of values in the sample

The difference between these two formulas lies only in the divisor. The divisor for population data is the total number of data points (N), and the divisor for sample data is the total number of data points minus one (n-1).

Example Calculation

Let’s simulate how to calculate standard deviation. If we can calculate it on paper (manually), it means we can also calculate it using a programming language.

For example, let’s consider the following dataset:

data = [90, 84, 88, 83, 87, 85, 83, 71]

First step, calculate the mean or average of the data:

===== Calculate Mean =====
mean = (90 + 84 + 88 + 83 + 87 + 85 + 83 + 71) / 8
mean = 83.875

After getting the average value, we can start calculating the variance:

===== Calculate Variance =====
(90 - 83.875) ^ 2 = 37.515625
(84 - 83.875) ^ 2 = 0.015625
(88 - 83.875) ^ 2 = 17.015625
(83 - 83.875) ^ 2 = 0.765625
(87 - 83.875) ^ 2 = 9.765625
(85 - 83.875) ^ 2 = 1.265625
(83 - 83.875) ^ 2 = 0.765625
(71 - 83.875) ^ 2 = 165.765625
total = 37.515625 + 0.015625 + 17.015625 + 0.765625 + 9.765625 + 1.265625 + 0.765625 + 165.765625
total = 232.875
variance = 232.875 / (8 - 1) # Using (n-1) for sample variance
variance = 33.267857142857146

From there, we can then calculate the standard deviation by taking the square root of the variance value:

===== Standard Deviation =====
s = square root of variance
s = square root of (33.267857142857146)
s = 5.767829500154902

Before We Start Coding

Before starting to code, I assume you are familiar with the basics of Python, especially the following three topics:

Because the program we are going to create will be closely related to the 3 topics above.

1. Calculating Standard Deviation Manually in Python

From the simulation we have done, we have an overview of the program flow that we will write:

  1. First, calculate the average (mean).
  2. Then, find the variance.
  3. Lastly, we take the square root (sqrt) of the variance value.

Here’s roughly the program code for the first step:

import statistics

data = [90, 84, 88, 83, 87, 85, 83, 71]
mean = statistics.mean(data)

Next step, calculate the variance:

list_variance = []
for number in data:
  list_variance.append(
    (number - mean) ** 2
  )
variance = sum(list_variance) / (len(data) - 1)

Or if you prefer the one-liner version, the code above can be replaced like this:

list_variance = [(number - mean) ** 2 for number in data]
variance = sum(list_variance) / (len(data) - 1)

Last step, we calculate the standard deviation using the statistics.sqrt() function (to find the square root):

standard_deviation = statistics.sqrt(variance)

print(f'data tt -> {data}')
print(f'standard deviation t -> {standard_deviation}')

Output:

data         -> [90, 84, 88, 83, 87, 85, 83, 71]
standard deviation   -> 5.767829500154902

Alt text: Python code snippet demonstrating manual calculation of standard deviation, showing data, mean, variance calculation, and final standard deviation output.

2. Calculating Standard Deviation Using statistics.stdev() Function

For the second method, it’s quite easy. We only need to call the function that has been provided in the statistics module. The two functions are:

  • statistics.stdev() function: to calculate the standard deviation of sample data.
  • statistics.pstdev() function: to calculate the standard deviation of population data.

Here is the program code:

import statistics

data = [90, 84, 88, 83, 87, 85, 83, 71]

standard_deviation = statistics.stdev(data)
population_standard_deviation = statistics.pstdev(data)

print(standard_deviation)
print(population_standard_deviation)

Output:

5.767829500154902
5.395310463726809

Alt text: Python code example using the statistics.stdev() function to calculate sample and population standard deviation with output showing both values.

Complete Code Program

If you want to get the complete program code from this session, you can get it in the python-latihan-logika repository on github.

Don’t forget to give a star! ⭐🌟

Next Session

God willing, in the next session, we will create a simple calculator program.

How to do it?

Keep following the Python logic exercise tutorial on jagongoding!

If you have any questions or anything you want to discuss, or even tutorial requests, don’t hesitate to comment, okay? 😁

Thank you very much!

References

[1] https://gurubelajarku.com/simpangan-baku/ – accessed April 24, 2021
[2] https://stackabuse.com/calculating-variance-and-standard-deviation-in-python/ – accessed April 24, 2021

Level 1 Python Mastery: Fast Track Course Take the Fastest Course to Master Python Language.

New 4.5 ⭐⭐⭐⭐ (261 Learners)

Take Class

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *