Open Source Contribution - Release 0.4 Part 1

November 26, 2018

Further Contributions to Pandas and pySearch

Throughout Release 0.4 I will continue to focus on contributing to Pandas and further expand the pySearch project. This release will be very similar to 0.3, however I will tackle different issues within these projects. One of the new things that I've never worked with and will be exploring, is CI (Continuous Integration).

Contributing to Pandas

Last week I started exploring linting errors within Pandas in further detail. When exploring these errors, I realized that Panda's contributors were causing some of these errors through attempts to better format their documentation, so it is easier to read. For example, one of the errors that came up dealt with "unnecessary indentation", although Panda's program that checks for these types of errors in docstrings identified the indentation as an error, the contributors put this indentation on purpose so the information is easier to read. Therefore, not all linting errors can be easily fixed and require discussion with Panda's moderators to find a better way to write the documentation without the need for such indentation, or keep ignoring some errors in favour of better readability. I will be further talking with moderators to see how we can tackle these issues.

One type of error that I fixed was, "Use only one blank line to separate sections or paragraphs". There were a total of 127 errors that dealt with this. In a massive project like Pandas, it is important to keep documentation and spacing consistent throughout the project. In addition, when the code consists of millions of lines of code, every line that can be removed results in better performance and saves space. My pull request consisted of fixing most of these errors. It was a rather tedious job because I had to go through many files and remove unnecessary blank lines within the docstrings. Sometimes finding these bugs was easy because I had a file and a line number as a reference to the general area of the docstring where the error occurs, however sometimes the lines were not shown or it was trickier to find certain docstrings as they are reused in different files but defined somewhere else.

Linting issue: https://github.com/pandas-dev/pandas/issues/23870

Linting pull request: https://github.com/pandas-dev/pandas/pull/23871

In addition to the linting errors, I found an issue that got me very interested. The issue asked for the rest of the CircleCI checks to be merged into Azure Pipelines. After the rest of the checks are merged within Azure Pipelines, CircleCI will be removed from the Pandas project. I'm currently exploring different ways to achieve this and will talk in more detail about this issue in the next blog.

CI issue: https://github.com/pandas-dev/pandas/issues/23821

Plans for pySearch

The pySearch project has been refactored and restructured by the creator. We agreed that we will not merge any pull requests until this process was completed. So, now that he completed restructuring the project, I will need to go back and ensure that the changes in my pull requests will now reflect the new structure.

Furthermore, the experience that I will receive from my recent issue in Pandas will be helpful for the pySearch project as I will be able to help integrate Continuous Integration like Travis CI, CircleCI, and Azure Pipelines within the project and put some slack on the moderators. With CI, the moderators will no longer have to worry about new pull requests breaking the program.

Search This Blog

Topics in Open Source Development