Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Python this program replaces names from an input file by a string \"**name**\" a

ID: 647910 • Letter: P

Question

Python

this program replaces names from an input file by a string "**name**" and saves that in an output file. Improve that regular expression, for e.g., add optional middle name/initial, suffixes, more prefixes etc. You may use multiple regular expressions if you want. The program need not be perfect, i.e. it need not cover every possible way a name can be written, it may miss some names and may incorrectly replace something that is not a name, but it should do a reasonable job. You may assume that a name always starts with a prefix.

Next, add code to replace email addresses by the string "**email**", again it may not do a perfect job but it should do a reasonable job.

Explanation / Answer

I haven't worked on the email section of the program yet, but I've done some work on the name deidentification. I've attached my program, but I'm having trouble getting it to deidentify a prefix and last name, so if anyone can help?

# This program removes names and email addresses occurring in a given input file and saves it in an output file.

import re
def deidentify():
    infilename = input("Give the input file name: ")
    #outfilename = input("Give the output file name: ")

    infile = open(infilename,"r")
    text = infile.read()
    infile.close()

    # replace names
    nameRE = "(Mrs.|Ms.|Mr.|Dr.|Prof.|Master)? [A-Z](.|[a-z]+)? ([A-Z][a-z]+)* (DO|Esq.|Jr.|Sr.)?"
    deidentified_text = re.sub(nameRE,"**name**",text)
    print(deidentified_text)

    # replace email addresses

    #outfile = open(outfilename,"w")
    #print(deidentified_text, file=outfile)
    #outfile.close()

deidentify()

To make checking the program easier, I've #'d out the outfile and directed the program to simply print.