Ok I keep refining this problem so it will be in the most clear and easiest to m
ID: 3916144 • Letter: O
Question
Ok I keep refining this problem so it will be in the most clear and easiest to modify format. I have now hardcoded two example dataframes so you can simply copy and paste into a jupyter notebook to actually see the dataframes. I am using the pandas package. So I'm trying to search through the Department column of df2 and find any substrings that are in the Dept_Nbr column of df5. When one of these matching substrings is found, I want my code to write the exact contents of the corresponding cell in column Dept_Desc_good of df5 to the TrueDepartment column of df2 unless there is a value in the corresponding Dept_Desc_better column of df5 and the I want to write the exact contents of that cell to the corresponding row of the TrueDepartment column of df2. df5 has consecutive department numbers that go from 1 to over 9000. In df5 there are no empty cells in the Dept_Nbr column or the Dept_Desc_good column. The last two columns of df5 are sparsely populated. df2 has many missing values throughout. The dataframes I created below are just an example of some different situations that occur in my actual full dataframe. The Department Numbers that occur as substrings in the Department column of df2 are usually in the form of something similar to 05-9632 or sometimes occur in cells with only digits and nothing else (i.e. "9700" or "27"). The substrings with the hyphens need to be recognized as 9632 or more generally whatever digits come after the hyphen. I probably need to use re (regular expressions in this part) If a cell in the Department column of df2 contained something like "market 466", i would not want this to be identified as a department number because it doesn’t meet the guidelines of being only digits in a cell (since it includes market) or being digits following a hypen. After running the code as it currently is, i receive the following error: TypeError Traceback (most recent call last) <ipython-input-9-5f3724025de0> in <module>() 8 for n in numbers: 9 for i in df5.index: ---> 10 if n in df2.loc[i, 'Department']: 11 if df5.at[int(n), 'Dept_Desc_better']: #if values exists 12 df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better')TypeError: 'in <string>' requires string as left operand, not int I tried changing line 10 to include [str(i), Department']: but that left me with a different error.
import pandas as pd import re import numpy as np
data= [['Empty','CMI-General Liability | 05-9632','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','Alarm Central 05-8642','Empty','Empty'],['Empty','Market 466','Empty','Empty'],['Empty','Talent, Experience','Empty','Empty'],['Empty','Food Division','Empty','Empty'],['Empty','Quality WMCC','Empty','Empty'],['Empty','Modular Execution Team | 01-9700','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','CE - Engineering - US','Empty','Empty'],['Empty','Fresh, Freezer & Cooler - 18-8110','Empty','Empty'],['Empty','9701','Empty','Empty'],['Empty','Contact Center','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','Private Brands GM - 01-8683','Empty','Empty']] df2=pd.DataFrame(data,columns=['JobTitle','Department','TrueDepartment','Dept_Function']) data5 = [[1,'TRUCKING, MARCY, NY','Empty','Empty'],[2,'TRUCKING-GREENVILLE,TN','Empty','Empty'],[3,'DC 40, HOPE MILLS, NC','Empty','Empty'],[4,'TRUCKING, SHARON SPRINGS','Empty','Empty'],[5,'DISP PAULS VALLEY OK FDC','Empty','Empty'],[6,'COLDWATER, MI','Empty','Empty'],[7,'AMERICOLD LOGISTICS','Empty','Empty'],[8,'DFW3N FORT WORTH FC WHS.COM','Empty','Empty'],[9,'PCCC CURRENTLY BEING REVIEWED','Empty','Empty'],[466,'Springfield, MO','Empty','Empty'],[8110,'Fresh Dept','Empty','Empty'],[8642,'Security','Security & Compliance','Empty'],[8683,'General Merchandise','Empty','Empty'],[9362,'General Liability','Empty','Empty'],[9700,'Execution Team','Empty','Empty'],[9701,'Produce TN','Empty','Empty']] df5=pd.DataFrame(data5,columns=['Dept_Nbr','Dept_Desc_good','Dept_Desc_better','Dept_Abrv'])
numbers = df5['Dept_Nbr'].tolist() df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']] df5.set_index('Dept_Nbr') for n in numbers: for i in df5.index: if n in df2.loc[i, 'Department']: if df5.at[int(n), 'Dept_Desc_better']: #if values exists df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better') else: df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_good')
A couple examples...When I get the code working properly, It would find "9632" in the Department column of df2 and would write the entire contents of the corresponding cell in the Dept_Desc_good column of df5 to the TrueDepartment column of df2. In this case of dept 9632, it would write "General Liability" to the TrueDepartment column of df2. If the search finds "8642" in the Department column of df2, then it would write "Security & Compliance" to the corresponding TrueDepartment cell of df2. It writes "Security & Compliance" instead of "Security" because there is actually a value in the Dept_Desc_better column of df5.
Ok I keep refining this problem so it will be in the most clear and easiest to modify format. I have now hardcoded two example dataframes so you can simply copy and paste into a jupyter notebook to actually see the dataframes. I am using the pandas package. So I'm trying to search through the Department column of df2 and find any substrings that are in the Dept_Nbr column of df5. When one of these matching substrings is found, I want my code to write the exact contents of the corresponding cell in column Dept_Desc_good of df5 to the TrueDepartment column of df2 unless there is a value in the corresponding Dept_Desc_better column of df5 and the I want to write the exact contents of that cell to the corresponding row of the TrueDepartment column of df2. df5 has consecutive department numbers that go from 1 to over 9000. In df5 there are no empty cells in the Dept_Nbr column or the Dept_Desc_good column. The last two columns of df5 are sparsely populated. df2 has many missing values throughout. The dataframes I created below are just an example of some different situations that occur in my actual full dataframe. The Department Numbers that occur as substrings in the Department column of df2 are usually in the form of something similar to 05-9632 or sometimes occur in cells with only digits and nothing else (i.e. "9700" or "27"). The substrings with the hyphens need to be recognized as 9632 or more generally whatever digits come after the hyphen. I probably need to use re (regular expressions in this part) If a cell in the Department column of df2 contained something like "market 466", i would not want this to be identified as a department number because it doesn’t meet the guidelines of being only digits in a cell (since it includes market) or being digits following a hypen. After running the code as it currently is, i receive the following error: TypeError Traceback (most recent call last) <ipython-input-9-5f3724025de0> in <module>() 8 for n in numbers: 9 for i in df5.index: ---> 10 if n in df2.loc[i, 'Department']: 11 if df5.at[int(n), 'Dept_Desc_better']: #if values exists 12 df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better')
TypeError: 'in <string>' requires string as left operand, not int I tried changing line 10 to include [str(i), Department']: but that left me with a different error.
import pandas as pd import re import numpy as np
data= [['Empty','CMI-General Liability | 05-9632','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','Alarm Central 05-8642','Empty','Empty'],['Empty','Market 466','Empty','Empty'],['Empty','Talent, Experience','Empty','Empty'],['Empty','Food Division','Empty','Empty'],['Empty','Quality WMCC','Empty','Empty'],['Empty','Modular Execution Team | 01-9700','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','CE - Engineering - US','Empty','Empty'],['Empty','Fresh, Freezer & Cooler - 18-8110','Empty','Empty'],['Empty','9701','Empty','Empty'],['Empty','Contact Center','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','Private Brands GM - 01-8683','Empty','Empty']] df2=pd.DataFrame(data,columns=['JobTitle','Department','TrueDepartment','Dept_Function']) data5 = [[1,'TRUCKING, MARCY, NY','Empty','Empty'],[2,'TRUCKING-GREENVILLE,TN','Empty','Empty'],[3,'DC 40, HOPE MILLS, NC','Empty','Empty'],[4,'TRUCKING, SHARON SPRINGS','Empty','Empty'],[5,'DISP PAULS VALLEY OK FDC','Empty','Empty'],[6,'COLDWATER, MI','Empty','Empty'],[7,'AMERICOLD LOGISTICS','Empty','Empty'],[8,'DFW3N FORT WORTH FC WHS.COM','Empty','Empty'],[9,'PCCC CURRENTLY BEING REVIEWED','Empty','Empty'],[466,'Springfield, MO','Empty','Empty'],[8110,'Fresh Dept','Empty','Empty'],[8642,'Security','Security & Compliance','Empty'],[8683,'General Merchandise','Empty','Empty'],[9362,'General Liability','Empty','Empty'],[9700,'Execution Team','Empty','Empty'],[9701,'Produce TN','Empty','Empty']] df5=pd.DataFrame(data5,columns=['Dept_Nbr','Dept_Desc_good','Dept_Desc_better','Dept_Abrv'])
numbers = df5['Dept_Nbr'].tolist() df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']] df5.set_index('Dept_Nbr') for n in numbers: for i in df5.index: if n in df2.loc[i, 'Department']: if df5.at[int(n), 'Dept_Desc_better']: #if values exists df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better') else: df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_good')
A couple examples...When I get the code working properly, It would find "9632" in the Department column of df2 and would write the entire contents of the corresponding cell in the Dept_Desc_good column of df5 to the TrueDepartment column of df2. In this case of dept 9632, it would write "General Liability" to the TrueDepartment column of df2. If the search finds "8642" in the Department column of df2, then it would write "Security & Compliance" to the corresponding TrueDepartment cell of df2. It writes "Security & Compliance" instead of "Security" because there is actually a value in the Dept_Desc_better column of df5.
Ok I keep refining this problem so it will be in the most clear and easiest to modify format. I have now hardcoded two example dataframes so you can simply copy and paste into a jupyter notebook to actually see the dataframes. I am using the pandas package. So I'm trying to search through the Department column of df2 and find any substrings that are in the Dept_Nbr column of df5. When one of these matching substrings is found, I want my code to write the exact contents of the corresponding cell in column Dept_Desc_good of df5 to the TrueDepartment column of df2 unless there is a value in the corresponding Dept_Desc_better column of df5 and the I want to write the exact contents of that cell to the corresponding row of the TrueDepartment column of df2. df5 has consecutive department numbers that go from 1 to over 9000. In df5 there are no empty cells in the Dept_Nbr column or the Dept_Desc_good column. The last two columns of df5 are sparsely populated. df2 has many missing values throughout. The dataframes I created below are just an example of some different situations that occur in my actual full dataframe. The Department Numbers that occur as substrings in the Department column of df2 are usually in the form of something similar to 05-9632 or sometimes occur in cells with only digits and nothing else (i.e. "9700" or "27"). The substrings with the hyphens need to be recognized as 9632 or more generally whatever digits come after the hyphen. I probably need to use re (regular expressions in this part) If a cell in the Department column of df2 contained something like "market 466", i would not want this to be identified as a department number because it doesn’t meet the guidelines of being only digits in a cell (since it includes market) or being digits following a hypen. After running the code as it currently is, i receive the following error: TypeError Traceback (most recent call last) <ipython-input-9-5f3724025de0> in <module>() 8 for n in numbers: 9 for i in df5.index: ---> 10 if n in df2.loc[i, 'Department']: 11 if df5.at[int(n), 'Dept_Desc_better']: #if values exists 12 df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better')
TypeError: 'in <string>' requires string as left operand, not int I tried changing line 10 to include [str(i), Department']: but that left me with a different error.
import pandas as pd import re import numpy as np
data= [['Empty','CMI-General Liability | 05-9632','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','Alarm Central 05-8642','Empty','Empty'],['Empty','Market 466','Empty','Empty'],['Empty','Talent, Experience','Empty','Empty'],['Empty','Food Division','Empty','Empty'],['Empty','Quality WMCC','Empty','Empty'],['Empty','Modular Execution Team | 01-9700','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','CE - Engineering - US','Empty','Empty'],['Empty','Fresh, Freezer & Cooler - 18-8110','Empty','Empty'],['Empty','9701','Empty','Empty'],['Empty','Contact Center','Empty','Empty'],['Empty','Central Operations','Empty','Empty'],['Empty','US Central Operations','Empty','Empty'],['Empty','Private Brands GM - 01-8683','Empty','Empty']] df2=pd.DataFrame(data,columns=['JobTitle','Department','TrueDepartment','Dept_Function']) data5 = [[1,'TRUCKING, MARCY, NY','Empty','Empty'],[2,'TRUCKING-GREENVILLE,TN','Empty','Empty'],[3,'DC 40, HOPE MILLS, NC','Empty','Empty'],[4,'TRUCKING, SHARON SPRINGS','Empty','Empty'],[5,'DISP PAULS VALLEY OK FDC','Empty','Empty'],[6,'COLDWATER, MI','Empty','Empty'],[7,'AMERICOLD LOGISTICS','Empty','Empty'],[8,'DFW3N FORT WORTH FC WHS.COM','Empty','Empty'],[9,'PCCC CURRENTLY BEING REVIEWED','Empty','Empty'],[466,'Springfield, MO','Empty','Empty'],[8110,'Fresh Dept','Empty','Empty'],[8642,'Security','Security & Compliance','Empty'],[8683,'General Merchandise','Empty','Empty'],[9362,'General Liability','Empty','Empty'],[9700,'Execution Team','Empty','Empty'],[9701,'Produce TN','Empty','Empty']] df5=pd.DataFrame(data5,columns=['Dept_Nbr','Dept_Desc_good','Dept_Desc_better','Dept_Abrv'])
numbers = df5['Dept_Nbr'].tolist() df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']] df5.set_index('Dept_Nbr') for n in numbers: for i in df5.index: if n in df2.loc[i, 'Department']: if df5.at[int(n), 'Dept_Desc_better']: #if values exists df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_better') else: df2.at[i, 'TrueDepartment'] = df5.at(int(n), 'Dept_Desc_good')
A couple examples...When I get the code working properly, It would find "9632" in the Department column of df2 and would write the entire contents of the corresponding cell in the Dept_Desc_good column of df5 to the TrueDepartment column of df2. In this case of dept 9632, it would write "General Liability" to the TrueDepartment column of df2. If the search finds "8642" in the Department column of df2, then it would write "Security & Compliance" to the corresponding TrueDepartment cell of df2. It writes "Security & Compliance" instead of "Security" because there is actually a value in the Dept_Desc_better column of df5.
Explanation / Answer
you can resolve the error by replacing 'if n in df2.loc[i,'Department']:' with 'if str(n) in df2.loc[i,'Department']:'
Related Questions
Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.