“ Programs are meant to be read by humans and only incidentally for computers to execute.”
―
Python, the programming language which supports multiple programming paradigms and which was first released in 1991 (yes, earlier than Java! ) has become more accepted with its contributions in enabling easier programming in scientific computing applications. This wonderful language has been my companion during my life as a research student with its wonderful libraries and ease of use, especially REPL.
I was excited to attend the Euroscipy 2018 which took place at Trento, a beautiful valley in Italy. I attended the main conference which took place on the 30th and 31st of August 2018. It showcased the use of Python in different scientific applications and there were talks given by people from Academia as well as Industry who used Python to get their job done. It was interesting to observe that most of the talks mentioned the use of Python in Machine Learning and Data Science. Also, various python libraries which people have initially developed for use within their research groups/ company are made open source and people could contribute to their further development.
There were some talks which I found especially interesting which I would like to mention a few points about.
One of these was a Python library called Imbalanced-learn by Guillaume Lemaitre. This library is used to make more accurate predictions when the data set used for training is skewed with the samples in some classes being comparatively much fewer in number. For example, in the problems such as cancer cell detection, solar wind records, and car insurance claims, the ratio of data samples across classes can be as high as 26:1. The approaches used to solve this involve a combination of unsupervised learning (outlier detection), semi-supervised learning (novelty detection) and supervised learning (resampling).
Various researchers from the bio-medical community were also present at the conference, explaining the use of Python and it’s libraries, to solve interesting problems in the bio-medical field like named entity recognition (using a library called OGER), dimensionality reduction in Neuroscience (using techniques like Tensor Component Analysis and demixed PCA, in addition to normal PCA) and Chaosolver which helps to determine phase space dynamics in bio-medical applications.
Another interesting talk which I found to be particularly pragmatic was titled ‘How not to screw up with Machine Learning in Production’. This talk focused on explaining the components in a Machine Learning system which are essential for production in addition to the core Machine Learning models, such as training/ serving skew and data validation. The talk suggested using existing solutions or a hybrid approach instead of building this entire ML eco-system from scratch (using tools such as TensorFlow Serving, Clipper, Apache PredicitonIO, SeldonCore/KubeFlow).
This was my first time, attending an international tech conference and it has given me many valuable experiences and insights. I am sure that I can find good use of the open-source Python libraries introduced to me in the conference. Also getting to know the speakers and discussing my interests and challenges with them has widened my horizons. Now that I am back from the conference, all that remains are the memories of kind people whom I met at Trento and some wise words and thoughts from the speakers.