Cython self-Learning (2)

If you haven’t seen the previous article, it is better to read it before you read this article.

In this article, I tried to prove my explaination, the reason why my Cython script did so bad, in last article and I found this words.

Because static typing is often the key to large speed gains, beginners often have a tendency to type everything in sight. This cuts down on both readability and flexibility, and can even slow things down (e.g. by adding unnecessary type checks, conversions, or slow buffer unpacking).
- Cython tutorial - Faster code via static typing - Determining where to add types [1]

It implies that if you declare type too more, your script may slower than Python version. Btw, there are some limits about declaring Numpy in Cython.

Cython at a glance, and decide where to add type [2]
No module type variable with Numpy [3]
Equivalent declaration between Cython and NumPy [4]

With these material, I start to modify my script and the new one is [here]. The following is the place is what I refined:

Merge function
Remove run_cliffwalk function and merge train-related parameters into qlearn function. This is just for simplify the script. I add the related loop into testing, so the total calculations won’t decrease.
Declare in the right place
In the first version, I declare lots of types on functions, but ignored the variables in these functions. In my understanding, first version had no help to speed up because calculating is happened in variables rather than output. (You can compare the GetAcion function and ValueUpdate function in [v1] and [v2] for more information.)

Project Comparison - cliffwalk_v2

With the refined script, I got the reasonable result.

========== pure python ==========
Record 30 times of executing cliffwalk 3000 times
100%|█████████████████████████████████| 30/30 [00:50<00:00,  1.69s/it]
the fastest result is 1.5165109999943525
the slowest result is 3.2652387999987695
the average result is 1.6859406133327866
========== naive cython   ==========
Record 30 times of executing cliffwalk 3000 times
100%|█████████████████████████████████| 30/30 [00:49<00:00,  1.65s/it]
the fastest result is 1.5883004000061192
the slowest result is 1.838901100010844
the average result is 1.6543756700023853
========== cython   ==========
Record 30 times of executing cliffwalk 3000 times
100%|█████████████████████████████████| 30/30 [00:24<00:00,  1.21it/s]
the fastest result is 0.7837197999760974
the slowest result is 0.8648693999857642
the average result is 0.8234472199973728

Although there is no dramatic progression like [5], but this result finally meet my expectation. Therefore, I think my assumption in last article about “inappropriate declaration” is right.

Here is some methods like calling C function or writing pyd header could speed up. However, as the quota in the top of this article “it will cut down on both readability and flexibility”, so I think the better is using them on heavy calculation instead of using it comprehensively.

Another assuption

I had another reason to explain in last artcle. It’s about “Cython can’t accelerate the script based on NumPy very much”. After reading some discussions, I think this reason is incorrect. Despite the fact that NumPy is fast, but you can still speed up the script with some advanced skills.

First, there are lots of functions in my script, and Cython has to check the types of the passed arrays during the run-time [6]. According to that discussion, if you have lots of functions and pass arrays very often, using class is a better choice.

Besides, there are couple of details can make Cython speed [7]. In this discussion, he mentioned three methods (Turn off bounds checking and wraparound [8], typed memoryview [9][10] and declare contiguous array [9]).

However, I think [8] is dangerous in common usage, but it still a chioce if necessary. Memoryview is similar to NumPy and almost no difference in usage and speed. Thus, I am not sure when/why should I use this skill.

Btw, lots of discussions suggest to use Julia if you want to make your life easier.

Future work

class version cliffwalk
Move to next topic - GIL
Making life easier with Julia

Reference

[1] Faster code via static typing [link]

[2] Cython for NumPy users [link]

[3] Cython says buffer types only allowed as function local variables even for ndarray.copy() [link]

[4] Difference between np.int, np.int_, int, and np.int_t in cython? [link]

[5] Cythonでnumpyを使った時の速度比較、高速化には何が必要か [link]

[6] How can I use cython to speed up the numpy? [link]

[7] Cython: slow numpy arrays [link]

[8] Cython tutorial - Tuning indexing further [link]

[9] Cython tutorial - Typed Memoryviews [link]

[10] Cython typed memoryviews: what they really are? [link]

Python - Global Interpreter Lock [link]
深入 GIL: 如何寫出快速且 thread-safe 的 Python – Grok the GIL: How to write fast and thread-safe Python [link]
Releasing from GIL in Cython [Link]

Extension Reading

Function in C [link]
C-Function [link]
Why Numba and Cython are not substitutes for Julia [link]
Python を高速化する Numba, Cython 等を使って Julia Micro-Benchmarks してみた [link]

Cython self-Learning (2)

Project Comparison - cliffwalk_v2

Another assuption

Future work

Reference

Reading related to GIL

Extension Reading

Further Reading

Cython self-Learning

Cython self-Learning (3)

staticmethod 與 classmethod 的差異

Trending Tags