Python this program replaces names from an input file by a string \"**name**\" a
ID: 647666 • Letter: P
Question
Python
this program replaces names from an input file by a string "**name**" and saves that in an output file. Improve that regular expression, for e.g., add optional middle name/initial, suffixes, more prefixes etc. You may use multiple regular expressions if you want. The program need not be perfect, i.e. it need not cover every possible way a name can be written, it may miss some names and may incorrectly replace something that is not a name, but it should do a reasonable job. You may assume that a name always starts with a prefix.
Next, add code to replace email addresses by the string "**email**", again it may not do a perfect job but it should do a reasonable job.
Explanation / Answer
I haven't worked on the email section of the program yet, but I've done some work on the name deidentification. I've attached my program, but I'm having trouble getting it to deidentify a prefix and last name, so if anyone can help?
# This program removes names and email addresses occurring in a given input file and saves it in an output file.
import re
def deidentify():
infilename = input("Give the input file name: ")
#outfilename = input("Give the output file name: ")
infile = open(infilename,"r")
text = infile.read()
infile.close()
# replace names
nameRE = "(Mrs.|Ms.|Mr.|Dr.|Prof.|Master)? [A-Z](.|[a-z]+)? ([A-Z][a-z]+)* (DO|Esq.|Jr.|Sr.)?"
deidentified_text = re.sub(nameRE,"**name**",text)
print(deidentified_text)
# replace email addresses
#outfile = open(outfilename,"w")
#print(deidentified_text, file=outfile)
#outfile.close()
deidentify()
To make checking the program easier, I've #'d out the outfile and directed the program to simply print.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.