Python/Harmattan/Performance Considerations for Python Apps

(Parsing .py files)
 
(20 intermediate revisions not shown)
Line 1: Line 1:
-
Based on [http://talk.maemo.org/showthread.php?t=50737&highlight=fmms|Making Python faster (for fmms initially)] and [http://talk.maemo.org/showthread.php?t=47850|Python Qt startup time tips ]
+
Based on [http://talk.maemo.org/showthread.php?t=50737&highlight=fmms Making Python faster (for fmms initially)] and [http://talk.maemo.org/showthread.php?t=47850 Python Qt startup time tips]
-
= Profiling =
+
See the [http://wiki.python.org/moin/PythonSpeed/PerformanceTips python.org page] and [http://talk.maemo.org/showpost.php?p=623199&postcount=18 Khertan's writeup] for general Python optimizations.
-
Do not worry about performance unless you notice a problem. Then only optimize what you can justify with profiling.
+
== Profiling ==
 +
 
 +
Do not worry about performance unless you notice a problem. Then only optimize what you can justify with profiling.
To profile Python code, run it with
To profile Python code, run it with
Line 16: Line 18:
See the [http://docs.python.org/library/profile.html python.org page] for more information on profiling
See the [http://docs.python.org/library/profile.html python.org page] for more information on profiling
-
= Improving Performance =
+
== Improving Performance ==
-
== Interpreter Choice ==
+
=== Interpreter Choice ===
-
=== Unladen Swallow ===
+
==== Unladen Swallow ====
[http://www.python.org/dev/peps/pep-3146/ PEP 3146 - Merging of Unladen Swallow]
[http://www.python.org/dev/peps/pep-3146/ PEP 3146 - Merging of Unladen Swallow]
Line 26: Line 28:
Currently Unladen Swallow has not seen too much performance benefit but has a longer start up time and takes more memory
Currently Unladen Swallow has not seen too much performance benefit but has a longer start up time and takes more memory
-
=== Psyco / Cython ===
+
==== Psyco / Cython ====
 +
 
 +
Compiles a restricted subset of python into a Python extension model
Do these work with Arm?
Do these work with Arm?
-
=== Shedskin ===
+
==== ShedSkin ====
 +
ShedSkin is a tool to convert a restricted subset of python into C++.
 +
This then can get compiled and used as a module in python.
 +
Tests have shown that there are great speed improvements possible.
 +
You dont need any knowledge in C or how to use gcc.
 +
{{main|ShedSkin}}
-
=== C with CTypes ===
+
==== Delegating to C with CTypes/SWIG ====
-
== Startup ==
+
??
-
=== /usr/bin/python Startup ===
+
=== Startup ===
-
Preloaders exists like [http://pylauncher.garage.maemo.org/ PyLauncher] that keep a python process around with heavy weight imports like gtk already imported.  On application launch it forks the preloader process.
+
==== /usr/bin/python Startup ====
-
Preloaders were favored back in the Maemo 4.1 days but has fallen out of favor lately.    Concerns center around always keeping an unused python process with heavy pieces of code imported always around [http://talk.maemo.org/showpost.php?p=622358&postcount=10].
+
Preloaders exists like [http://pylauncher.garage.maemo.org/ PyLauncher] that keep a python process around with heavy weight imports like gtk already imported. On application launch it forks the preloader process.
-
=== Parsing .py files ===
+
Preloaders were favored back in the Maemo 4.1 days but has fallen out of favor lately. Concerns center around always keeping an unused python process with heavy pieces of code imported always around [http://talk.maemo.org/showpost.php?p=622358&postcount=10].
-
==== Stripping the Code ====
+
==== Parsing .py files ====
-
A major downside is that the code that your users is running is different than the code you develop with.  This means any stack traces that users provide will be a bit more complicated to decipher.
+
===== Stripping the Code =====
-
==== Generating pyc/pyo files ====
+
A major downside is that the code that your users is running is different than the code you develop with. This means any stack traces that users provide will be a bit more complicated to decipher.
 +
 
 +
Benchmarks from stripping code[http://talk.maemo.org/showpost.php?p=627311&postcount=28]
 +
 
 +
First test - normal code
 +
2104 lines of code
 +
580 blank lines
 +
215 code lines
 +
Load time from icon click to fully loaded - 10.04 seconds
 +
 
 +
Second Test - Cleared up code
 +
2104 lines of code
 +
0 blank lines
 +
80 code lines
 +
Load time from icon click to fully loaded - 9.25 seconds
 +
 
 +
Third - Cleared up code!!
 +
1469 lines of code
 +
0 blank lines
 +
80 code lines
 +
Load time from icon click to fully loaded - 8.40 (5 tests , from 8.09 to 8.60)
 +
 
 +
===== Generating pyc/pyo files =====
Python serializes its state after importing a file to save on re-parsing.  It saves these next to the .py files which means if the user does not have write access, Python will not be able to cache it.
Python serializes its state after importing a file to save on re-parsing.  It saves these next to the .py files which means if the user does not have write access, Python will not be able to cache it.
-
=== Perceived Startup Performance ===
+
Generating pyc/pyo files should be done as a package postinst/postrm per Debian Python Policy[http://talk.maemo.org/showpost.php?p=628464&postcount=35]
 +
 
 +
Approaches:
 +
 
 +
* <code>py_compilefiles src/*.py</code> [http://talk.maemo.org/showpost.php?p=627477&postcount=31]
 +
* Python-support is even very easy to use, basically just add <code>dh_pysupport</code> to <code>debian/rules</code> and <code>python-support</code> to <code>Build-depends</code> and <code>Depends</code>. Just make sure that <code>postinst</code> has <code>#DEBHELPER#</code> somewhere [http://talk.maemo.org/showpost.php?p=634286&postcount=52]
 +
* <code>python -m compileall TOPLEVEL.py</code> [http://talk.maemo.org/showpost.php?p=644005&postcount=60]
 +
 
 +
===== pyo Files =====
 +
 
 +
A decent description of pyo files [http://www.network-theory.co.uk/docs/pytut/CompiledPythonfiles.html]
 +
* When the Python interpreter is invoked with the <code>-O</code> flag, optimized code is generated and stored in ‘<code>.pyo</code>’ files. The optimizer currently doesn't help much; it only removes assert statements. When <code>-O</code> is used, all bytecode is optimized; <code>.pyc</code> files are ignored and <code>.py</code> files are compiled to optimized bytecode.
 +
* Passing two <code>-O</code> flags to the Python interpreter (<code>-OO</code>) will cause the bytecode compiler to perform optimizations that could in some rare cases result in malfunctioning programs. Currently only <code>__doc__</code> strings are removed from the bytecode, resulting in more compact ‘<code>.pyo</code>’ files. Since some programs may rely on having these available, you should only use this option if you know what you're doing.
 +
* A program doesn't run any faster when it is read from a ‘<code>.pyc</code>’ or ‘<code>.pyo</code>’ file than when it is read from a ‘<code>.py</code>’ file; the only thing that's faster about ‘<code>.pyc</code>’ or ‘<code>.pyo</code>’ files is the speed with which they are loaded.
 +
* When a script is run by giving its name on the command line, the bytecode for the script is never written to a ‘<code>.pyc</code>’ or ‘<code>.pyo</code>’ file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having a small bootstrap script that imports that module. It is also possible to name a ‘<code>.pyc</code>’ or ‘<code>.pyo</code>’ file directly on the command line.
 +
* The module ‘<code>compileall</code>’{} can create ‘<code>.pyc</code>’ files (or ‘<code>.pyo</code>’ files when <code>-O</code> is used) for all modules in a directory.
 +
 
 +
==== Delayed work ====
 +
 
 +
With Dialcentral, epage found that caching off results from a good number of "re.compile" at object creation time (which occurs in a background thread) saved a significant amount on startup.  Even with profiling until the background thread is finished showed a speed up. epage suspects this is due to fewer class variable assignments / queries which might be more expensive then the equivalent on a instance.
 +
 
 +
Sadly this was done earlier in the Dialcentral release cycle and no performance numbers are available to back up these claims. It was considered significant enough at the time to make the code slightly uglier.
 +
 
 +
==== Perceived Startup Performance ====
[http://lists.maemo.org/pipermail/maemo-developers/2009-October/021478.html hildon_gtk_window_take_screenshot] takes advantage of user perception to make the user think the app is launched faster.
[http://lists.maemo.org/pipermail/maemo-developers/2009-October/021478.html hildon_gtk_window_take_screenshot] takes advantage of user perception to make the user think the app is launched faster.
-
== Responsiveness ==
+
=== Responsiveness ===
 +
 
 +
==== Thread per Logical Unit ====
 +
 
 +
The One Ring has separate threads for its D-Bus logic and its networking logic. It does this separation through a worker thread that the D-Bus thread posts tasks to. Results come as callbacks in the D-Bus thread.
 +
 
 +
See [https://garage.maemo.org/plugins/ggit/browse.php/?p=theonering;a=blob;f=src/util/go_utils.py;h=20ccac19201a4eae1b019fe2861e120b92010f55;hb=835f22ccd4b8e357bcbbc364d451e98721401e72 AsyncLinearExecutor] and some [https://garage.maemo.org/plugins/ggit/browse.php/?p=theonering;a=blob;f=src/channel/call.py;h=aefc59861a1cede286d4094bf8fc35101f2ff19d;hb=835f22ccd4b8e357bcbbc364d451e98721401e72 example code]
 +
 
 +
==== Splitting a call between multiple callbacks ====
 +
 
 +
See [http://talk.maemo.org/showpost.php?p=623199&postcount=18 Khertan's approach]
 +
 
 +
epage's approach[https://garage.maemo.org/plugins/ggit/browse.php/?p=gc-dialer;a=blob;f=src/gtk_toolbox.py;h=17dc166b5c37c5d15efc6eb7baef52b3250544ac;hb=a0b504bcfb10c73989fbb8578f073e81f4f7edbd]:
 +
<source lang="python">
 +
def make_idler(func):
 +
        """
 +
        Decorator that makes a generator-function into a function that will continue execution on next call
 +
        """
 +
        a = []
 +
        @functools.wraps(func)
 +
        def decorated_func(*args, **kwds):
 +
                if not a:
 +
                        a.append(func(*args, **kwds))
 +
                try:
 +
                        a[0].next()
 +
                        return True
 +
                except StopIteration:
 +
                        del a[:]
 +
                        return False
 +
        return decorated_func
 +
</source>
 +
 
 +
Example
 +
<source lang="python">
 +
@make_idler
 +
def func(self):
 +
  ... long code ...
 +
  yield
 +
  ... long code ...
 +
  yield
 +
  ... long code ...
 +
  yield
 +
  ... long code ...
 +
  yield
 +
...
 +
callback = make_idler(func)
 +
gobject.idle_add(callback)
 +
</source>
 +
 
 +
=== Memory Usage ===
 +
 
 +
Use of [http://docs.python.org/reference/datamodel.html#slots slots]
 +
 
 +
== FAQ ==
-
== Memory Usage ==
+
=== Is Python slow? ===
-
= FAQ =
+
The standard response of "it depends". For a graphical application not doing too much processing a user will probably not notice it is written in Python. Compare that to an experiment by epage in writing a GST video filter in Python that at best ran at 2 seconds per frame.
-
== Is Python slow? ==
+
== Further reading ==
-
The standard response of "it depends".  For a graphical application not doing too much processing a user will probably not notice it is written in Python.  Compare that to an experiment by epage in writing a GST video filter in python that at best ran at 2 seconds per frame.
+
* [[PyQt Tips and Tricks]] - similar guide for PyQt
[[Category:Python]]
[[Category:Python]]

Latest revision as of 17:52, 24 November 2011

Based on Making Python faster (for fmms initially) and Python Qt startup time tips

See the python.org page and Khertan's writeup for general Python optimizations.

Contents

[edit] Profiling

Do not worry about performance unless you notice a problem. Then only optimize what you can justify with profiling.

To profile Python code, run it with

$ python -m cProfile -o .profile TOPLEVEL_SCRIPT.py

To then analyze the results

$ python -m pstats .profile
> sort cumulative
> stats 40

That sorted the results by the time it took for a function and all the functions it called. It then displays the top 40 results.

See the python.org page for more information on profiling

[edit] Improving Performance

[edit] Interpreter Choice

[edit] Unladen Swallow

PEP 3146 - Merging of Unladen Swallow

Currently Unladen Swallow has not seen too much performance benefit but has a longer start up time and takes more memory

[edit] Psyco / Cython

Compiles a restricted subset of python into a Python extension model

Do these work with Arm?

[edit] ShedSkin

ShedSkin is a tool to convert a restricted subset of python into C++. This then can get compiled and used as a module in python. Tests have shown that there are great speed improvements possible. You dont need any knowledge in C or how to use gcc.

Main article: ShedSkin


[edit] Delegating to C with CTypes/SWIG

??

[edit] Startup

[edit] /usr/bin/python Startup

Preloaders exists like PyLauncher that keep a python process around with heavy weight imports like gtk already imported. On application launch it forks the preloader process.

Preloaders were favored back in the Maemo 4.1 days but has fallen out of favor lately. Concerns center around always keeping an unused python process with heavy pieces of code imported always around [1].

[edit] Parsing .py files

[edit] Stripping the Code

A major downside is that the code that your users is running is different than the code you develop with. This means any stack traces that users provide will be a bit more complicated to decipher.

Benchmarks from stripping code[2]

First test - normal code

2104 lines of code
580 blank lines
215 code lines
Load time from icon click to fully loaded - 10.04 seconds

Second Test - Cleared up code

2104 lines of code
0 blank lines
80 code lines
Load time from icon click to fully loaded - 9.25 seconds

Third - Cleared up code!!

1469 lines of code
0 blank lines
80 code lines
Load time from icon click to fully loaded - 8.40 (5 tests , from 8.09 to 8.60)
[edit] Generating pyc/pyo files

Python serializes its state after importing a file to save on re-parsing. It saves these next to the .py files which means if the user does not have write access, Python will not be able to cache it.

Generating pyc/pyo files should be done as a package postinst/postrm per Debian Python Policy[3]

Approaches:

  • py_compilefiles src/*.py [4]
  • Python-support is even very easy to use, basically just add dh_pysupport to debian/rules and python-support to Build-depends and Depends. Just make sure that postinst has #DEBHELPER# somewhere [5]
  • python -m compileall TOPLEVEL.py [6]
[edit] pyo Files

A decent description of pyo files [7]

  • When the Python interpreter is invoked with the -O flag, optimized code is generated and stored in ‘.pyo’ files. The optimizer currently doesn't help much; it only removes assert statements. When -O is used, all bytecode is optimized; .pyc files are ignored and .py files are compiled to optimized bytecode.
  • Passing two -O flags to the Python interpreter (-OO) will cause the bytecode compiler to perform optimizations that could in some rare cases result in malfunctioning programs. Currently only __doc__ strings are removed from the bytecode, resulting in more compact ‘.pyo’ files. Since some programs may rely on having these available, you should only use this option if you know what you're doing.
  • A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.
  • When a script is run by giving its name on the command line, the bytecode for the script is never written to a ‘.pyc’ or ‘.pyo’ file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having a small bootstrap script that imports that module. It is also possible to name a ‘.pyc’ or ‘.pyo’ file directly on the command line.
  • The module ‘compileall’{} can create ‘.pyc’ files (or ‘.pyo’ files when -O is used) for all modules in a directory.

[edit] Delayed work

With Dialcentral, epage found that caching off results from a good number of "re.compile" at object creation time (which occurs in a background thread) saved a significant amount on startup. Even with profiling until the background thread is finished showed a speed up. epage suspects this is due to fewer class variable assignments / queries which might be more expensive then the equivalent on a instance.

Sadly this was done earlier in the Dialcentral release cycle and no performance numbers are available to back up these claims. It was considered significant enough at the time to make the code slightly uglier.

[edit] Perceived Startup Performance

hildon_gtk_window_take_screenshot takes advantage of user perception to make the user think the app is launched faster.

[edit] Responsiveness

[edit] Thread per Logical Unit

The One Ring has separate threads for its D-Bus logic and its networking logic. It does this separation through a worker thread that the D-Bus thread posts tasks to. Results come as callbacks in the D-Bus thread.

See AsyncLinearExecutor and some example code

[edit] Splitting a call between multiple callbacks

See Khertan's approach

epage's approach[8]:

 def make_idler(func):
         """
         Decorator that makes a generator-function into a function that will continue execution on next call
         """
         a = []
         @functools.wraps(func)
         def decorated_func(*args, **kwds):
                 if not a:
                         a.append(func(*args, **kwds))
                 try:
                         a[0].next()
                         return True
                 except StopIteration:
                         del a[:]
                         return False
         return decorated_func

Example

 @make_idler
 def func(self):
   ... long code ...
   yield
   ... long code ...
   yield
   ... long code ...
   yield
   ... long code ...
   yield
 ...
 callback = make_idler(func)
 gobject.idle_add(callback)

[edit] Memory Usage

Use of slots

[edit] FAQ

[edit] Is Python slow?

The standard response of "it depends". For a graphical application not doing too much processing a user will probably not notice it is written in Python. Compare that to an experiment by epage in writing a GST video filter in Python that at best ran at 2 seconds per frame.

[edit] Further reading