python - How can I replace every instance of a character from 3 groups of characters with just 3 different characters respective

IT技术

更新时间：2025-04-250

admin管理员组
文章数量:1431726

This is my input:

"Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

I need it to output an array that looks like this

["Once", " ", "there", " ", "was", " ", "a", ",", "so", " ", "called", ",", "rock", ".", "it", ".", "was", " ", "not", ".", "in", " ", "fact", ",", "a", " ", "big", " ", "rock"]

There are some rules that the input needs to go through to make the punctuation be like this. These are how the rules go:

spaceDelimiters  = " -_" 
commaDelimiters  = ",():;\""
periodDelimiters = ".!?"

If there's a spaceDelimiter character then it should replace it with a space. Same goes for the other comma and period ones. Comma has priority over space, and period has priority over comma

I got to a point where I was able to remove all of the delimiter characters, but I need them to be as separate pieces of an array. As well as there being a hierarchy, with periods overriding commas overriding spaces

Maybe my approach is just wrong? This is what I've got:

def split(string, delimiters):
    regex_pattern = '|'.join(map(re.escape, delimiters))
    return re.split(regex_pattern, string)

Which ends up doing everything wrong. It's not even close

This is my input:

"Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

I need it to output an array that looks like this

["Once", " ", "there", " ", "was", " ", "a", ",", "so", " ", "called", ",", "rock", ".", "it", ".", "was", " ", "not", ".", "in", " ", "fact", ",", "a", " ", "big", " ", "rock"]

There are some rules that the input needs to go through to make the punctuation be like this. These are how the rules go:

spaceDelimiters  = " -_" 
commaDelimiters  = ",():;\""
periodDelimiters = ".!?"

If there's a spaceDelimiter character then it should replace it with a space. Same goes for the other comma and period ones. Comma has priority over space, and period has priority over comma

Maybe my approach is just wrong? This is what I've got:

def split(string, delimiters):
    regex_pattern = '|'.join(map(re.escape, delimiters))
    return re.split(regex_pattern, string)

Which ends up doing everything wrong. It's not even close

Share Improve this question asked Nov 19, 2024 at 11:51 zealantanner 313 bronze badges

What is delimiters? – no comment Commented Nov 19, 2024 at 12:01
"If there's a spaceDelimiter character then it should replace it with a space." - you're not doing any replacing in your current code, you are splitting the input into parts. – C3roe Commented Nov 19, 2024 at 12:02
What are you actually trying to do here, in the grand scheme of things? I'm having a hard time coming up with a use case where you'd want to record spaces in an array like that. – CAustin Commented Nov 19, 2024 at 12:20
a weird project where I plan to make a speech synthesizer. I want the program to say each word and pause for the appropriate amount of time for spaces commas and periods. As well as a few other punctuation marks but those can be the same amount of time as spaces commas or periods. Hope that made sense – zealantanner Commented Nov 19, 2024 at 12:25

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

Use the re library to split text on word boundaries, then replace in sequence of precident

import re

s="Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

# split regex into tokens along word boundaries
regex=r"\b"

l=re.split(regex,s)

def replaceDelimeters(token:str):
    
    # in each token identify if it contains a delimeter
    spaceDelimiters  = r"[^- _]*[- _]+[^- _]*" 
    commaDelimiters  = r"[^,():;\"]*[,():;\"]+[^,():;\"]*"
    periodDelimiters = r"[^.!?]*[.!?]+[^.!?]*"
    
    # substitute for the replacement
    token=re.sub(periodDelimiters,".",token)
    token=re.sub(commaDelimiters,",",token)
    token=re.sub(spaceDelimiters," ",token)
    return token

# apply
[replaceDelimeters(token) for token in l if token!=""]

This method returns "." as the last entry to the list. I don't know if this is your desired behavior; your desired output states otherwise, but your logic appears to desire this. Deleting the last entry if it is a period should be easy enough in any case.

You can do it with a single regular expression.

Define your rules in precedence order (from lowest to highest) with the replacement character as the initial character:

rules = {
    "space": " _-" , # put - last in the rule
    "comma": ",():;\"",
    "period": ".!?",
}

Then create a regular expression which is either one-or-more characters matching no rules or one-or-more characters matching at least one character matching the rule and any number of characters matching that rule and any lower precedence rules with the highest precedence rule earliest in the regular expression pattern:

prev = ""
rule_patterns = deque()
for name, rule in rules.items():
    prev = rule + prev
    rule_patterns.appendleft(f"(?P<{name}>[{prev}]*?[{rule}][{prev}]*)")
rule_patterns.appendleft(f"(?P<other>[^{prev}]+)")

pattern = repile("|".join(rule_patterns))

Which generates the pattern (?P<other>[^.!?,():;" _-]+)|(?P<period>[.!?,():;" _-]*?[.!?][.!?,():;" _-]*)|(?P<comma>[,():;" _-]*?[,():;"][,():;" _-]*)|(?P<space>[ _-]*?[ _-][ _-]*)

Then given your value:

value = "Once there     was a (so-called) rock. it.,was not! in fact, a big rock."

You can find all the matches and, where a rule is matched instead output the first character in the rule:

matches = [
    next(
        (rule[0] for name, rule in rules.items() if match.group(name)),
        match.group("other")
    )
    for match in pattern.finditer(value)
]

print(matches)

Outputs:

['Once', ' ', 'there', ' ', 'was', ' ', 'a', ',', 'so', ' ', 'called', ',', 'rock', '.', 'it', '.', 'was', ' ', 'not', '.', 'in', ' ', 'fact', ',', 'a', ' ', 'big', ' ', 'rock', '.']

本文标签：

版权声明：本文标题：python - How can I replace every instance of a character from 3 groups of characters with just 3 different characters respective 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1745564023a2663650.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

python - How can I replace every instance of a character from 3 groups of characters with just 3 different characters respective

2 Answers 2

更多相关文章

html - How to create a 4 digit input series without JavaScript? - Stack Overflow

javascript - JS, JQuery and Observable - Stack Overflow

javascript - Download file in NodeJS - Stack Overflow

media - Is it possible to change &#39;Link to&#39; in all images in all articles?

javascript - Firebase Functions - Missing events logging after third request - Stack Overflow

visual studio - How to use DeepLink with Xamarin Android (NOT FORMS) - Stack Overflow

javascript - React-Router NavLink changes ripple colors on Material-UI ListItem - Stack Overflow

permalinks - When I click on a single post my browser goes to about:blank#blocked and the page is white

Sending an email via Power Automate when specific items in SharePoint is edited - Stack Overflow

javascript - requirejs http:requirejs.orgdocserrors.html#scripterror - Stack Overflow

javascript - XMLHttpRequest POST parameter encoding - Stack Overflow

javascript - How to migrate google sheets API v3 to google sheets Api v4 - Stack Overflow

constraints - How to assign grouped orders to a single vehicle in a routing problem with OR-Tools? - Stack Overflow

onclick - Javascript Trigger keypress on mouse click WITHOUT Jquery - Stack Overflow

database - Is there a smart way to obtain a list of only some selected user meta data?

java - Collision detection issue with entity speed in 2D Top-Down RPG Game - Stack Overflow

javascript - jQuery DataTables - custom filter for column that contains text field - Stack Overflow

javascript - Bundle Wasm + JS file into one using webpack - Stack Overflow

ide - Android Studio annoying line pasting issue - Stack Overflow

ggplot2 - Dorling cartograms in R: suppressing normalization? - Stack Overflow

发表评论

推荐文章

Get postpage title from ID

javascript - How could I add load more posts to my theme?

java - Making HTML element visibleinvisible depending on HttpServletRequest parameter - Stack Overflow

Pre get posts where template is not equal to one specified?

javascript - Dropzone - max files not working - Stack Overflow

热门文章

javascript - Is there any way to make two jQuery animations run (properly) simultaneously? - Stack Overflow

javascript - Checking if window.pageYoffset is &gt;= to set value? - Stack Overflow

javascript - Google OAuth 2.0 SAMEORIGIN Error - Stack Overflow

python - How to handle dynamic pagination in Selenium where XPath changes for each page? - Stack Overflow

linux - Confusion about the maximum number of Java threads under Debian12 - Stack Overflow

php - Get first URL from post content

python - Error when deploying dash app with dash-tools and render - Stack Overflow

javascript - How to map an array after making changes to it in react js? - Stack Overflow

Using Gutenberg parse_blocks Function With ACF Custom Blocks?

javascript - XMLHttpRequest Open and Send: How to tell if it worked - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

c# - Pass argument from link button to javascript - Stack Overflow

Path for php file for inserting data through html form

ggplot2 - Dorling cartograms in R: suppressing normalization? - Stack Overflow

javascript - Form Validation for dynamic added fields in Angular 6 - Stack Overflow

javascript - Vue with Laravel, passing null props - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

media - Is it possible to change 'Link to' in all images in all articles?

javascript - Checking if window.pageYoffset is >= to set value? - Stack Overflow