Open Source Contribution - Release 0.3 Summary

Contributing to pySearch and Pandas

The project pySearch is written in Python and is essentially a cross platform command-line utility tool to search through and display results from popular search engines like Amazon, Google, StackOverflow, and etc. The idea is to input "search commands" in command line and get results from the previously listed websites.

Image result for pandas

Pandas is a famous open-source library for Python that provides high-performance and easy-to-use data structures and data analysis tools. It is often used in the machine learning field. Pandas project has over 1,300 contributors and 18,000 commits, which makes it one of the biggest open-source projects related to the machine learning field.


The Idea Behind Release 0.3

Throughout Release 0.1, I got involved with contributing unit testing on a project that is written in Node.js. This was rather an easy way to get involved with open-source because my professor was a moderator for this project, so I was able to get guidance from my professor to help me begin my journey of exploring open-source development. 

Once I got some experience with Git and Github, and got the idea behind open-source, I was ready to dive into Hacktoberfest and complete 6 pull requests on a small, a medium, and a huge project. This was Release 0.2 and the idea behind it was to explore as many different ways to contribute to open-source as possible. 

Now, once I have some experience with different aspects of open-source and contributing to outside projects, me and few of my classmates decided to work on our own internal project. The project we decided to create and grow is pySearch, which I talked about in the beginning of this blog. Throughout Release 0.3 and 0.4, I am trying to grow this project with my classmates and hopefully get others to contribute. In addition to the internal project, I'm continuing to work on Pandas to get even more familiar with their code to allow me to solve more complex bugs in Pandas in the future. 


My Contributions to pySearch

The first issue that I worked on in pySearch attempts to prevents the opening of invalid websites and domains once the program runs. In my pull requests I fix this issue by using Invoke-RestMethod from Powershell on Windows and a cURL post requests on Mac and Linux. These calls ensure that the url specified by the program (which can be modified through command-line arguments) is valid and exists. Once verified and there are no issues, my function returns success and the program proceeds to opening the url specified by the user. If there is an issue, my function loops through popular domains to see if the problem is caused by an incorrect domain (for example, searching through stackoverflow.ca is not valid, while stackoverflow.com is valid). If a valid domain is found, the program notifies the user that domain has been changed to a valid one, and the program opens the url with a modified valid domain. Lastly, if there are no valid domains found, the program does not open anything and notifies the user that the URL is invalid.


The second issue that I worked on in pySearch attempts to increase the search engine coverage. Previously to my pull request, pySearch only supported Google, Stackoverflow, Twitter, and Amazon. However, there are many more search engines that can be searched through. The problem is that different search engines use different query parameters to search for terms. To expand the search engine coverage, I added different cases for other popular search engines.

List of added search engines:
  • yandex
  • duckduckgo
  • archive
  • boardreader
  • ask
  • facebook
  • yahhoo


Contributing to Pandas Throughout Release 0.3 and 0.4

Throughout Release 0.3, I was trying to set myself up with a good amount of work to do for a few weeks in the Pandas project. Pandas is such a big project, it is often hard to find bugs to work on because a lot of people are looking to contribute and often take bugs hours if not minutes after they are posted. During Hacktoberfest, I had to constantly refresh the Pandas issue page to have a chance of getting a piece of contributions. In order to get myself good ideas for contributions that have not been filed yet, I asked one of the moderators for some ideas of contributions. He suggested some ways that I can find and file bugs myself, before they get filed by the moderators. This way, by creating and organizing issues, I save some time for the moderators and they don't need to do this later on themselves. The moderator that I spoke to suggested for me to start fixing the linting errors that are currently ignored within the Pandas project.

Here are the commands that I can run in command line within the Pandas project to view the linting errors:

  • ./scripts/validate_docstrings.py

Once this runs, over 7600 errors get shown that all need to be eventually fixed within Pandas. It is very time consuming to fix all these errors at once, so moderators pick out specific errors in specific directories and ask contributors to fix 10-20 errors at a time in issues. I will be going through these errors and creating some issues for myself to work on, as well as for other contributors. In order to shorten the list of 7600 errors, I will need to pick out specific errors and focus on one error at a time.

To pick a specific error, the command looks like this: ./scripts/validate_docstrings.py --errors=PR10


  • flake8-rst

Flake8 is another tool that Pandas uses to find linting errors within their project and can also be run on top of validate_docstrings.py to ensure that style and format of code and documentation within the Pandas project is consistent and up to the current standards.

To pick a specific error, the command looks like this: flake8-rst --select E902



Summary

Throughout Release 0.3 I focused on working on an internal open-source project with my classmates and develop an interesting command-line tool used to search through popular search engines within the command line. This was an interesting experience and I am planning to continue developing this tool in Release 0.4.

Furthermore, I set myself up for many issues in the Pandas project, which I will be creating from the linting errors that still need to be addressed. This feels very accomplishing, because not only am I contributing to Pandas, I'm also helping moderators pick out specific errors and create issues that they need to create anyways in the future. Since the moderators introduced flake8 and validate_docstrings.py to the project, they've been trying to address one error at a time and there are still many left.

I will see you in Release 0.4, where I will have more information on the linting errors within Pandas and the state of the pySearch project.













Comments

Popular posts from this blog

First Enhancement in Pandas

Working with Incomplete MultiIndex keys in Pandas

Progress in Open Source