New Year - New Open-Source Contributions

Project Considerations for Beginning of 2019


After exploring different aspects of open-source for a few months and blogging about what I've been exploring, it was nice to take a few weeks off and go on a vacation. However, 2019 has begun and I'm well rested and ready to jump back into contributing more to big open-source projects and exploring more difficult bugs.

For the next few months I will most likely focus on one major open-source project in order to try to fix complicated bugs that might take me a long time to figure out. So in order to not waste time on startup costs of exploring a new project, I will focus all my time fixing complex bugs in a single project.

Although I would like to continue working on Pandas, it is important for me to explore a couple other options that I would be interested in. In this blog I will talk about 3 projects that I'm considering to work on for the next few months.

Project 1 - Pandas

Pandas is a famous open-source library for Python that provides high-performance and easy-to-use data structures and data analysis tools. It is often used in the machine learning field. Pandas project has over 1,300 contributors and 18,000 commits, which makes it one of the biggest open-source projects related to the machine learning field.


I have spent a considerable amount of time exploring this project during Fall of 2018. I enjoyed learning how to navigate through thousands of lines of code in this massive project and fixing different kinds of bugs. Now that I have some experience working with this project and don't need to waste time learning how to navigate through the project, I can quickly jump into fixing complicated bugs without the overhead of jumping into a new project. 

In addition, Pandas is one of the only projects that I found that organizes and tracks their issues and pull requests very well and utilizes tags better than most projects. This kind of organization makes my life easier and allows me to be organized as well when making my contributions.

Project 2 - Scikit-learn

Scikit-learn is another famous open-source machine learning library for Python that provides simple and efficient tools for data mining and data analysis. It provides tools for data preprocessing, different machine learning algorithms, and tools for dimensionality reduction and model selection. It has over 23,000 commits and 1,200 contributors, which is a very similar size to the Pandas project.


I am currently working as a Research Assistant / Software Developer on my third machine learning project at Seneca. Over the last year, I had to use many tools for data analysis, including scikit-learn. I would like to pursue a career in the machine learning field after I graduate Seneca and contributing to a project like scikit-learn will be very beneficial to me.

When looking for projects in 2018, I was considering working on scikit-learn, however Pandas had better organization of issues and pull requests. Some of the issues in scikit-learn are missing tags, which makes it harder to find bugs to work on. However, now that I'm more familiar with open-source and Github, I would highly consider working on scikit-learn,

Project 3 - Firefox

Firefox is an open-source web browser developed by Mozilla. It is one of the biggest open-source projects that exists to this day and majority of computer users at-least heard of Firefox if not using it every day. It is unlike the other two projects that I'm considering, however I would also like to explore new programming fields and expand my knowledge. It will be a huge experience contributing to Firefox.
I am currently looking for co-op/internship opportunities for the Summer of 2019, and I came across some job postings from Mozilla. Contributing to one of their products would allow me to gain some experience that will be beneficial to my co-op placement. Therefore, exploring this project is on my list of considerations. However, Firefox is not on Github and will involve big start-up costs for me to dive into this project and learn their environment. 


Conclusion

Pandas, scikit-learn, and Firefox are all great projects to contribute to and would all bring me a lot of experience that I need. Now it is time to weigh the pros and cons of choosing each project and choose one to dive into!

Comments

Popular posts from this blog

Another Dropna Bug in Pandas

Unit Tests in Pandas

Progress in Open Source