I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don’t know how to appropriately loop through items in the list when re.sub as the third parameter requires a string. eg. re.sub(pattern, repl, string, count=0, flags=0)
import re file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa'] file_lst_trimmed = for file in file_lst: file_lst_trimmed = re.sub(r'1.fa', '', file)
The issue arising here is that re.sub expects a string and I want it to loop through a list of strings.
Thanks for any advice!
You can use a list comprehension to construct the new list with the cleaned up files names.
\d is the regex to match a single character and
$ only matches at the end of the string.
file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]
>>> file_lst_trimmed ['cats', 'cats', 'dog', 'dog']
You can try this:
import re file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa'] final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]
['cats', 'cats', 'dog', 'dog']
I prefer to python internal functions rather than importing and using a library if possible. Using regex for such simple task might not be the best way to do it. This approach looks clean.
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa'] file_lst_trimmed = for file in file_lst: file_lst_trimmed.append(file.split('.')[:-1])
No need for regex, use the standard library
os and os.path.splittext for this.
Split the pathname path into a pair (root, ext) such that root + ext
== path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored;
splitext(‘.cshrc’) returns (‘.cshrc’, ”).
import os.path l = ['hello.fa', 'images/hello.png'] [os.path.splitext(filename) for filename in l]
Your loop is actually perfectly fine! There are two other issues.
file_lst_trimmedequal to your string every iteration of the loop. You want to use
Your regular expression is
'1.fa'when it should really just be
'.fa'(assuming you only want to strip .fa extensions).
EDIT: I now see that you also want to remove the last number. In that case, you’ll want
\d is a stand-in for any digit 0-9, and
\d+ means a string of digits of any length — so this will remove 10, 11, 13254, etc. The
\ before the
. is because
. is a special character that needs to be escaped.) If you want to remove arbitrary file extensions, you’ll want to put
\w+ instead of
fa — a string of letters of any length. You might want to check out the documentation for regex.