Open Source Contribution - Release 0.2.5

Contributing to Pandas (Part 2)



Pandas is a famous open-source library for Python that provides high-performance and easy-to-use data structures and data analysis tools. It is often used in the machine learning field. Pandas project has over 1,300 contributors and 18,000 commits, which makes it one of the biggest open-source projects related to the machine learning field.

Hacktoberfest 2018

This year I am participating in Hacktoberfest. Throughout the month of October, I will need to make 5 pull requests with contributions to one or more projects on Github. This is my fifth contribution to Hacktoberfest and I came across the Pandas project on Github in my previous contribution. 

Throughout this Hacktoberfest I'm trying to find different ways to contribute to the open-source community. I've already explored non-code contributions and how they can be useful. I also explored contributing to a start up project. Now, in order to gain experience with working on a large open-source project, I will keep contributing to Pandas. 


I got lucky with finding my first issue in Pandas because I caught a brand new issue almost as soon as it was posted and commented that I wanted to contribute right away. However, to find my second issue, I had to do more research and look into older issues that haven't been resolved. And this is when I found an issue that was abandoned by somebody a month ago and expressed my interest in solving it.


My Contribution

One of Pandas' developers brought attention to an issue that caused bad coding practices. Pandas uses flake8, which is a tool for coding style guide enforcement. One of the warnings that flake8 deals with is a bare except warning (E722), which asks to make the except statements specific to certain errors. However, this warning was silenced and ignored, which allowed developers to put bare except statements when handling errors, which is a bad coding practice. In order to solve this issue, I had to add proper errors to except statements and remove E722 from ignore lists, this would prevent future contributors from leaving except statements empty.

Bad Coding Practice:










Good Coding Practice:












In order for me to make these changes, I had to remind myself and research all the possible errors that can be caught by except statements in Python. The following picture shows a part of a table with short descriptions about different built-in exceptions in Python. I often used this table to remind myself what each exceptions deals with after learning each exception in-depth.





After learning all the built-in exceptions, I had to go through the remaining bare except statements in Pandas, figure out what the code in the try statement does and figure out which errors it can throw in order to catch the proper exceptions. Figuring out the code was easy in certain cases, but most of the time it was difficult to tell at first exactly what the code does. However, after some dedication I was able to figure out the errors and complete my contribution.



Link to the issue:

https://github.com/pandas-dev/pandas/issues/22872


Link to the pull request:


List of things that I contributed to Pandas:
  • Added specific errors to bare except statements in the following files:
    • doc/source/conf.py
    • pandas/tests/indexing/common.py
    • pandas/tests/io/formats/test_format.py
    • pandas/tests/io/test_pytables.py
    • pandas/tests/io/test_sql.py
    • pandas/tests/test_multilevel.py
    • pandas/tests/test_nanops.py
    • pandas/tests/test_panel.py
    • pandas/tests/test_strings.py
  • Removed the E722 from ignore lists, so bare except warnings are no longer muted.
  • Ensured that the except statements worked as expected.
  • Dealt with errors that occurred during CI checks and ensured that my changes are up to project standards in order to merge them with the project.

Conclusion



This contribution gave me further experience with working on a large project and experience with reviewing other people's code in order to identify the exceptions that it can throw. Furthermore, relearning built-in exceptions in Python was refreshing and beneficial to my future projects. 

Don't be afraid to contribute to large open-source project. As you can see from my experiences, nothing bad can come out from attempting to contribute. Worst case scenario is you won't be able to finish your contribution, however you will receive immense experience that will help you in the future.

Comments

Popular posts from this blog

First Enhancement in Pandas

Working with Incomplete MultiIndex keys in Pandas

Progress in Open Source