Posts

Showing posts from February, 2019

Working with Incomplete MultiIndex keys in Pandas

Image
Incomplete MultiIndex keys in Dropna This week I produced a Pull Request for the bug in Pandas that I discussed last week. Although I think the solution that I provided can be further improved, it does fix the immediate issue of incomplete MultiIndex keys in the subset parameter of the "dropna()" function. I submitted a Pull Request to gather some feedback on my solution and improve it based on the moderator's discretion. Overview of the Issue https://github.com/pandas-dev/pandas/issues/17737 The creator of the issue brought to attention a bug that does not allow MultiIndex keys to be incomplete in the subset parameter of the "dropna()" function.  DataFrame.dropna(...) function removes missing values. If you pass in a "subset=..." parameter into it, it will remove rows based on the specified list of columns that are listed in the subset. I created some sample code, following the examples that the moderator gave and here a...

Solving the Next Bug in Pandas

Image
Solving the Next Bug in Pandas This week I went on another search for an issue in Pandas. I used the same tactic that I started using this year, I've been finding and solving older bugs that nobody fixed yet. There are over 2,800 issues in Pandas and it takes a while to find a bug that you're interested in working on, but when you fix the bug, you get a big feeling of satisfaction and accomplishment. But before we dive into the new bug, I will talk about a quick update from the bug I was working on last week. Update on Last Week's Bug Issue:  https://github.com/pandas-dev/pandas/issues/21510 Pull Request:  https://github.com/pandas-dev/pandas/pull/25224 The feedback on the issue relating ordering of regular dictionaries for Python 3.6 and 3.7 that I previously observed from the moderators was outdated. Previously, I stated that the ordering of regular dictionaries in Python 3.6 is an implementation detail of CPython, and users should not rely on it. Howev...

Fixing Order of OrderedDict in Pandas

Image
Fixing Order of OrderedDict in Pandas This week I've been working on the same issue from my last week's blog. This week I was producing a solution to the issue, taking into account the requests of the moderators. Although I made a pull request and think I'm on the right track, I'm waiting for the moderators to give some feedback on my pull request before I proceed any further. Overview of the Issue https://github.com/pandas-dev/pandas/issues/21510 The creator of the issue has brought attention to a bug that is caused when combining concat function and OrderedDict. OrderedDict is an ordered dictionary, which means that the dictionary should remember the order of the items added to this dictionary. However, when combining this dictionary with concat, the order gets lost. Here are the observations: In [ 2 ]: from collections import OrderedDict In [ 3 ]: pd.concat(OrderedDict([( ' First ' , pd.Series( range ( 3 ))), ... : ...

Using Unit Tests to Fix a Bug in Pandas

Image
Using Unit Tests to Fix a Bug in Pandas The issue that I decided to work on this week is one of the issues I mentioned two weeks ago when I was planning my work for the next couple of months. This time, I'm expanding on my knowledge of unit tests in Pandas from last week and using them to my advantage to fix an outstanding bug. Overview of the Issue https://github.com/pandas-dev/pandas/issues/21510 The following issue has been posted on June 16, 2018. It's a pretty old issue and one person attempted to solve it. However, the moderators requested changes from the person working on the issue and he never committed anything that would satisfy their requests. After a while, one of the moderators stated that the pull request is now stale and closed it.  On December 2, 2018 after ensuring that the issue was not yet resolved, one of the moderators updated the issue to have a "Contributions Welcome" milestone. Seeing that the moderators are still inter...