22 August 2020

Combining Python's with-syntax and decorators in a single class

This notebook should show the differences between using a decorator and the with syntax in python. Both will be shown at the example of measuring the time that is required to execute a specific part of code.

With-Syntax

The with syntax is very usefull especially in cases, where ressources are setup and are freed after their use, such as:

Opening / Closing files
Setting / releasing a lock or semaphore
Ensuring a database commit and rolling back on an exception (see example 3 from here)

In order to implement them, there is already a ready to use AbstractContextManager class available, which just requires to implement a __enter__ and __exit__ method to implement e.g. a time keeper decorator:

In [1]:

from contextlib import AbstractContextManager

import time

class TrackTime(AbstractContextManager):
    def __init__(self):
        pass
    
    def __enter__(self):
        self.start_time = time.time()
        
    def __exit__(self, *args):
        end_time = time.time()
        print(f"Total running time was: {end_time - self.start_time}")

In [2]:

with TrackTime():
    time.sleep(2)

Total running time was: 2.0008251667022705

A decorator can be used directly to annotate a function and achieve a similar result like above: Tracking the time of that function with a decorator:

In [3]:

def tracktimefunction(func, *args, **kwargs):
    def wrapping_function(*args, **kwargs):
        start_time = time.time()

        func(*args, **kwargs)

        end_time = time.time()
        print(f"Total running time was: {end_time - start_time}")
        return
    return wrapping_function

In [4]:

@tracktimefunction
def wait_2s():
    time.sleep(2)

wait_2s()

Total running time was: 2.0113117694854736

The decorator can be easily integrated into the same class from above that served the purpose to time the time of a specific section of the code with the with syntax. This can be achieved by making the class "callable". The method __call__ will be responsible now for wrapping the function internally and it can be further simplified to use the with syntax with the class itself:

In [5]:

from contextlib import AbstractContextManager

import time

class TrackTime(AbstractContextManager):
    def __init__(self):
        pass
    
    def __enter__(self):
        self.start_time = time.time()
        
    def __exit__(self, *args):
        end_time = time.time()
        print(f"Total running time was: {end_time - self.start_time}")

    def __call__(self, func, *args, **kwargs):
        def wrapping_function(*args, **kwargs):
            with self:
                return func(*args, **kwargs)
        return wrapping_function

Now both, the decorator as well as the with syntax is possible within the same class:

In [6]:

@TrackTime()
def wait_2s():
    time.sleep(2)

wait_2s()

with TrackTime():
    time.sleep(3)

Total running time was: 2.0006048679351807
Total running time was: 3.0028138160705566

07 August 2017

Optimize rsync performance to remote server

A fast and easy method to synchronize or backup files from one computer to another one is rsync. This post will first briefly introduce rsync and the used command line parameters that are mostly used and will outline some modifications, which will greatly speed up the data transfer from about 20-25 MB/s to over 90 MB/s in a local and non-metered network!

Short description of rsync

The rsync protocol is able to efficiently synchronize files between computers. It does this by transferring only changed files and in addition to that, it tries to transfer only the differences between the local and remote file to further minimize the transferred data size.
The underlying data transfer is secured with the Secure Shell or SSH, which adds another layer of computational overhead to encrypt the data securely, transferring to the remote server and decrypting it again. Because I am mostly on the road with a metered internet connection, I use the maximum possible compression directly in my main SSH configuration, which will become the bottleneck, when using rsync in a local and fast network. In the latter case, disabling the compression of rsync as well as reducing the encryption of rsync is able to increase the rsync performance a lot!

Description of used command line parameters

My default command line parameters with descriptions from the full rsync man page:

-a: archive mode (equals -rlptgoD): recursive, copy symlinks as symlinks, preserve permissions, preserve modification times, preserve group, preserve owner, preserve device files and special files
-v: increase verbose information during the transfer
-u: skip files that are newer on the receiver
-r: recurse into directories
--progress: show progress during transfer
--delete: delete extraneous files from the remote server

Remove SSH bottleneck to optimize performance of rsync

The maximum compression of SSH (that is used by default in my config) is helpful on a metered connection with a small bandwidth, where you gain time from the reduced data amount. In a local and fast network, this turns out to be the bottleneck and manifests in 100% CPU usage of the SSH command. In this fast local network, it is much faster to transfer the files as is without any compression because compressing and uncompressing would take longer than just transferring the plain file.

In order to do that, these options can be used to speed up the data transfer in the local network:

-T: diable pseudo-tty allocation on the destination
-c aes128-ctr: select a weaker but faster SSH encryption. Others specify arcfour, which would require manual modification of ssh_config on the destination host. This is not always possible and this encryption worked just fine for me.
-x: disable X11 forwarding
-o Compression=no: disable SSH compression bottleneck described above

The full command to backup folder foo to the remote folder bar on the destination host desthost then:

export RSYNC_RSH="ssh -T -c aes128-ctr -o Compression=no -x"
rsync -avur --progress --delete foo desthost:bar

With this command, it was possible to increase the transfer rates from about 20-25 MB/s to more than 90 MB/s!

02 November 2015

"On-the-fly" Coupled Cluster Path Integral Molecular Dynamics

In this post, I'll show some supplementary details in addition to our "on-the-fly" coupled cluster path integral molecular dynamics paper [1].

In physical chemistry, there are many examples where molecular dynamics is used to sample atomistic systems. The electronic structure of such systems can be described with various methods. One very accurate ab initio method (i.e. one that does not require empirical parameters to describe the electronic structure), but also computationally expensive one is the coupled-cluster (CC) theory. In our paper, we used it to calculate the interatomic forces in a molecular dynamics simulation of the protonated water dimer [1]. We note in passing that this is to our knowledge first CC-based molecular dynamics simulation, which we will refer to also as the classical simulation in the following.

The forces have been calculated with cfour and the dynamics with a modified version of i-PI. To negotiate between cfour and i-PI, I wrote a wrapper that hands over the positions from i-PI to cfour and the forces from cfour back to i-PI. If anyone is interested in obtaining the wrapper, feel free to drop me a mail.

Here is a movie that depicts the classical point particles and the hopping of the proton between the two waters

Also the nuclei can be treated as quantum mechanically blurred particles. The path integral molecular dynamics (together with the Born-Oppenheimer approximation) provides an easy and (at least after a while) intuitively approach, where now each particle is replaced by a closed ring polymer with harmonic springs between the particles. In a pictographic way this would look like

where each of the two particles are represented by a closed P-bead ring polymer and the interatomic potential needs to be evaluated P times. In this case P has been chosen as 6, but it depends on the quantum nature of the system. Generally speaking, the nuclear quantum effects increase with decreasing temperature and particle masses.

The quantum simulation with quantum mechanically blurred particles and with P=32 now looks like

which is the to our knowledge first CC-based PIMD simulation.

For more details about the simulation, see the publication [1].

References:

[1] T. Spura, H. Elgabarty and T. D. Kühne, Phys. Chem. Chem. Phys., 2015, 17, 14355-14359 DOI: 10.1039/C4CP05192K

06 August 2015

Publication ready figures with matplotlib and Jupyter notebook

A very convenient workflow to analyze data interactively in Python is to use the Jupyer notebook or IPython Notebook in combination with matplotlib. These figures can be used in various ways and file formats depending on the kind of publication.
For instance, to put the figure on a webpage, most softwares support only png or jpg formats, so that a fixed resolution must be provided. On the other hand, a scalable figure format can be scaled as needed and readily used in a pdf document, such as a pgf image in a journal that uses LaTeX. This way, there won't be artifacts when zooming into the figure. This blog post contains a small function that saves the matplotlib figure to various file formats automatically, which can then be used for various occasions.

Creating a simple plot¶

A simple sample plot can be created within an jupyter notebook with:

Loading matplotlib and setting up jupyter notebook to display the graphics inline:

In [1]:

%matplotlib inline
import seaborn as snb
import numpy as np
import matplotlib.pyplot as plt

Creating a quatratic plot:

In [2]:

def create_plot():
    x = np.arange(0.0, 10.0, 0.1)
    plt.plot(x, x**2)
    plt.xlabel("$x$")
    plt.ylabel("$y=x^2$")
    
create_plot()
plt.show()

Save the figure¶

The previous figure can be saved with calling savefig from matplotlib.pyplot and matplotlib will save the figure in the output format based on the extension of the filename. To save to various formats, one would need to call this function several times or instead define a new function that can be included as boilerplate in the first cell of a notebook such as:

In [3]:

def save_to_file(filename, fig=None):
    """Save to @filename with a custom set of file formats.
    
    By default, this function takes to most recent figure,
    but a @fig can also be passed to this function as an argument.
    """
    formats = [
                "pdf",
                "eps",
                "png",
                "pgf",
              ]
    if fig is None:
        for form in formats:
            plt.savefig("%s.%s"%(filename, form))
    else:
        for form in formats:
            fig.savefig("%s.%s"%(filename, form))

And it can be easily saved with:

In [4]:

create_plot()
save_to_file("simple_plot")

My choice of formats is to save in the png format to put this figure online such as on a web page and several scalable figure formats to include it in a pdf. I will write more on how to do that with LaTeX in a future blog post.
The full notebook for creating the figure above and with the boilerplate in the first cell of the notebook can be found at github.

Edited:

August 09, 2017: Use jupyter instead ipython in the text.

Thomas Spura's Blog