Posts

Showing posts from March, 2019

DataFrame from Dict to Follow Insertion Order

Image
DataFrame from Dict to Follow Insertion Order This week I ran into a problem when working on the issue I started working on last week, so I will talk about the challenges that I faced. In addition, because the fix for the issue was not successful, I found another issue to work on. I will also be talking about this issue and how I used my previous knowledge in Pandas to make a Pull Request with a possible fix. Crosstab Dropna Issue After discovering the issue last week and creating a test to help me solve it, I started analyzing the cause for the issue and possible solutions. I discovered that the origin of the issue comes from an aggregation function that gets called inside the pivot_table function that is used when creating a crosstab DataFrame. The aggregation functions don't take into account missing values as column names and row names and perform aggregation on existing values.  In this case the aggregation function performs a calculation needed for crosstab a

Another Dropna Bug in Pandas

Image
Another Dropna Bug in Pandas This year I fixed a few bugs that dealt with NaN values and more specifically the Dropna function in Pandas. Every bug like this made me more and more interested in fixing behaviour that is not consistent across the code base, specifically how the functions deal with non existent values. Some functions have a dropna argument. When set to true, the function should drop columns/rows that contain missing values and when set to false, the function should keep columns/rows that contain missing values. Currently, some functions deal well with NaN values, however there are functions that either produce buggy behaviour with NaN or don't deal with it consistently across the code base. This week I found another bug like this and I will be working on it throughout next week. Issue https://github.com/pandas-dev/pandas/issues/10772 The creator of the issue noticed inconsistent behaviour when using the dropna argument inside the crosstab function

Fixing a Couple Small Issues in Pandas

Image
Fixing a Couple Small Issues in Pandas This week I managed to fix a couple small issues in Pandas. One of them was created due to the behaviour observed in my previous Pull Request, where I ensured that the ordering of OrderedDicts and dicts in >= Python 3.6 was respected. In addition I made another simple fix while working on another issue, which didn't seem trivial at first, but in the end was just a simple mistake in the code. It was easy to fix the problem, however it took some time to actually find were the problem originated from. I will talk about fixing both of these small issues in more detail below. Replacing Dicts with OrderedDicts in Aggregation Functions of Groupby: Issue:  https://github.com/pandas-dev/pandas/issues/25692 Pull Request:  https://github.com/pandas-dev/pandas/pull/25693 While working on a previous Pull Request, which ensured that ordering of dicts and OrderedDicts was respected across different versions of Python, we came across o