Happy Numbers
Image from Wikipedia
Intro
An integer is happy if, when calculating the sum of the squares of its digits, then the sum of the squares of the digits of the resulting number, and so on, it eventually reaches 1. Otherwise, the number is unhappy.
This give me the opportunity to “play” with 2 different code in Python and check which one is faster. I then compare timings with a C++ implementation.
At the end you should be able to solve your first puzzle on CodinGame. If you never did before, make a try. This is really fun.
First Python Code
Let’s check if 24 is happy. Here is the code :
def sum_of_squared_digits2(n1:int)->int:
n2 = 0
while n1:
digit=n1%10
# n2 += pow(digit, 2)
# n2+=digit**2
n2+=digit*digit
n1=n1//10
n1=n2
return n2
n_set = set()
n = 24
while (n!=1 and n not in n_set):
n_set.add(n)
n = sum_of_squared_digits2(n)
print(n)
print("Happy") if n==1 else print("Unhappy")
This output looks like :
20
4
16
37
58
89
145
42
20
Unhappy
As you can read in the comments, I did some tests with pow()
and **2
but finally using the cached value digit
was more efficient.
Second implementation
Here I try to simplify sum_of_squared_digits()
- The one liner goes like this :
- Convert
n
in a string - Then read each “char” (digit) of the string
- Convert each char as an
int
- Elevate the
int
to power of 2
- Convert
def sum_of_squared_digits(n:int)->int:
return sum([int(i)**2 for i in str(n)])
n_set = set()
n = 24
while (n!=1 and n not in n_set):
n_set.add(n)
n = sum_of_squared_digits(n)
print(n)
print("Happy") if n==1 else print("Unhappy")
No surprise, we get the same output.
However we may want to know which approach is faster. One thing I learned with optimized C++ compiler:
- When it comes to benchmarks, never assume. Measure!
Benchmarking 1 in Python
import time
k_MAX=100_000
def sum_of_squared_digits(n:int)->int:
return sum([int(i)**2 for i in str(n)])
start_time = time.time() # Start timer
for n in range(1, k_MAX+1):
n_set = set()
n_init = n
while (n!=1 and n not in n_set):
n_set.add(n)
n = sum_of_squared_digits(n)
end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")
Execution time: 0.797350 seconds
Benchmarking 2 in Python
import time
k_MAX = 100_000
def sum_of_squared_digits2(n_in:int)->int:
n_out = 0
while n_in:
digit=n_in%10
n_out += digit*digit
n_in=n_in//10
n_in=n_out
return n_out
start_time = time.time() # Start timer
for n in range(1, k_MAX + 1):
n_set = set()
n_init = n
while n != 1 and n not in n_set:
n_set.add(n)
n = sum_of_squared_digits2(n)
end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")
Execution time: 0.482066 seconds
. The code of sum_of_squared_digits2()
is longer but there is no string conversion etc. And we use digit
which should be in the local cache of the processor. This explains the 1.6 speed ratio.
I did some testings calling the script with -O
within a console but I did’nt get any significant improvement. No, I did’nt transpile Python to C.
Benchmarking in C++
- If you don’t have a C++ compiler on your host (shame on you! 😁) you can copy, past and run the code below with an online C++ compiler.
- I selected C++23 and
-02
optimizations and did some test with C++20 and no optimization.
#include <iostream>
#include <unordered_set>
#include <chrono>
constexpr int k_MAX = 100'000;
int sum_of_squared_digits3(int n_in) {
int n_out = 0;
while (n_in != 0) {
int digit = n_in % 10;
n_out += digit * digit;
n_in /= 10;
}
return n_out;
}
int main() {
auto start_time = std::chrono::high_resolution_clock::now();
for (int n = 1; n <= k_MAX; ++n) {
std::unordered_set<int> n_set;
int n_current = n;
while (n_current != 1 && !n_set.contains(n_current)) {
n_set.insert(n_current);
n_current = sum_of_squared_digits3(n_current);
}
}
auto end_time = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed_seconds = end_time - start_time;
std::cout << "Execution time: " << elapsed_seconds.count() << " seconds\n";
return 0;
}
Execution time: 0.083385 seconds
- For such a simple algorithm, the syntax is very similar across languages. So, even if Python is the only programming language you know, you should understand what is happening here. Additionally, I kept the variable names the same.
- 34 lines in C++ vs 24 in Python
- 0.08 sec in C++ vs 0.48 sec in Python
- 6 times faster
Benchmarking C++ V2
#include <iostream>
#include <unordered_set>
#include <array>
#include <chrono>
constexpr int k_MAX = 100'000;
constexpr std::array<int, 10> k_squares = [] {
std::array<int, 10> squares{};
for (int i = 0; i < 10; ++i) {
squares[i] = i * i;
}
return squares;
}();
int sum_of_squared_digits2(int n_in) {
int n_out = 0;
while (n_in != 0) {
int digit = n_in % 10;
n_out += k_squares[digit];
n_in /= 10;
}
return n_out;
}
int main() {
auto start_time = std::chrono::high_resolution_clock::now();
for (int n = 1; n <= k_MAX; ++n) {
std::unordered_set<int> n_set;
int n_current = n;
while (n_current != 1 && !n_set.contains(n_current)) {
n_set.insert(n_current);
n_current = sum_of_squared_digits2(n_current);
}
}
auto end_time = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed_seconds = end_time - start_time;
std::cout << "Execution time: " << elapsed_seconds.count() << " seconds\n";
return 0;
}
The timing is similar.
I suspect the optimizer is doing a great job. I did’nt look at assembly language but you can use Compiler Explorer to do so. It is always very interesting. Read this article to see a real case where looking to assembly language was THE solution.
Again, even on Compiler Explorer, make sure to select C++ 20 or later otherwise the n_set.contains(n_current)
function call is not be available.
Conclusion
- Yes, we could have used numpy and see what happen when vectorizing the algorithm.
- Yes, we could do a better job with some multithreading etc.
- But… It is always a tradeoff between the time spent vs speed improvement
- Regarding speed improvement vs time spent, feel free to read this post about Sieve of Eratosthenes
- Now you should be able to solve this puzzle on CodinGame
PS :
Try this
import time
from functools import lru_cache
k_MAX = 100_000
@lru_cache(maxsize=None)
def sum_of_squared_digits2(n_in: int) -> int:
n_out = 0
while n_in:
digit = n_in % 10
n_out += digit * digit
n_in = n_in // 10
return n_out
start_time = time.time()
for n in range(1, k_MAX + 1):
n_set = set()
n_init = n
while n != 1 and n not in n_set:
n_set.add(n)
n = sum_of_squared_digits2(n)
end_time = time.time()
print(f"Execution time: {end_time - start_time:.6f} seconds")
Execution time: 0.287076 seconds