Recommand · October 22, 2021 0

Optimizing loop sequence

I am trying to check whether an item from a list exists one or more times in a data frame column, and if so, then use some info of that entire row to extract some data.

The data frame has entries like this:

df = 

      prefix  value                          binary
---------------------------------------------------
0         30  yes  01010000101000000000000000001101
1         29  yes  01010000101001111110111110101011
2         29  no   10000000010011011011110001111011

The current code looks something like this:

list1 = []
list2 = []
for i, binary in enumerate(list_of_binary_numbers):
    print(f"Executing {i+1}")

    list1_tmp = 0
    list2_tmp = 0
    for index, row in df.iterrows():
        if binary == row["binary"][0 : len(binary)]:
            if row["value"] == "yes":
                list1_tmp += 2 ** (32 - int(row["prefix"]))
            elif row["value"] == "no":
                list2_tmp += 2 ** (32 - int(row["prefix"]))
    list1.append(list1_tmp)
    list2.append(list2_tmp)

So basically list_of_binary_numbers is a list with shortened binary numbers, and I need to check whether this shortened part of a full binary number exists in the df. That’s why I do the [0 : len(binary)] so they have the same length.

List looks like this:

list_of_binary_numbers =

0                   00000010011010000
1                 0000001001101000100
2            000000100110101000000110
3            000000100110101000000111
4             00000010011010100010000

The issue is that the list_of_binary_numbers are roughly 150.000 items, and so is the data frame. So each main iteration takes roughly 1 sec to do, hence, this will take forever to complete.

I just can’t see any other good way to achieve this, so that’s why I am asking for some help.