admin管理员组文章数量:1435505
I have a data table where one variable is messy and can contain different variants of the same value (e.g., team name Newcastle United or Newcastle). These variants occur alongside another grouping-like variable (e.g., both the Premier League and A-League have Newcastle clubs with different team name variants):
team = c('Newcastle United','Newcastle','Newcastle Utd','Newcastle United Jets','Newcastle','Newcastle Jets')
competition=c('Premier League','Premier League','Premier League','A-League','A-League','A-League')
df = tibble(team,competition)
# A tibble: 6 × 2
team competition
<chr> <chr>
1 Newcastle United Premier League
2 Newcastle Premier League
3 Newcastle Utd Premier League
4 Newcastle United Jets A-League
5 Newcastle A-League
6 Newcastle Jets A-League
I also have a lookup table that specifies the desired team name per competition as follows:
old_name=c('Newcastle','Newcastle Utd','Newcastle','Newcastle United Jets')
new_name=c('Newcastle United','Newcastle United','Newcastle Jets','Newcastle Jets')
competition=c('Premier League','Premier League','A-League','A-League')
lookup=tibble(old_name,new_name,competition)
# A tibble: 4 × 3
old_name new_name competition
<chr> <chr> <chr>
1 Newcastle Newcastle United Premier League
2 Newcastle Utd Newcastle United Premier League
3 Newcastle Newcastle Jets A-League
4 Newcastle United Jets Newcastle Jets A-League
How can I recode/relabel team
such that only the relevant competition
from the lookup table is used? I tried combining dplyr's group_by
and recode
in different ways but no luck so far.
(My real data and lookup tables are much bigger and the data table includes cases that don't have a match in the lookup table.)
Desired output:
# A tibble: 6 × 2
team competition
<chr> <chr>
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets A-League
5 Newcastle Jets A-League
6 Newcastle Jets A-League
I have a data table where one variable is messy and can contain different variants of the same value (e.g., team name Newcastle United or Newcastle). These variants occur alongside another grouping-like variable (e.g., both the Premier League and A-League have Newcastle clubs with different team name variants):
team = c('Newcastle United','Newcastle','Newcastle Utd','Newcastle United Jets','Newcastle','Newcastle Jets')
competition=c('Premier League','Premier League','Premier League','A-League','A-League','A-League')
df = tibble(team,competition)
# A tibble: 6 × 2
team competition
<chr> <chr>
1 Newcastle United Premier League
2 Newcastle Premier League
3 Newcastle Utd Premier League
4 Newcastle United Jets A-League
5 Newcastle A-League
6 Newcastle Jets A-League
I also have a lookup table that specifies the desired team name per competition as follows:
old_name=c('Newcastle','Newcastle Utd','Newcastle','Newcastle United Jets')
new_name=c('Newcastle United','Newcastle United','Newcastle Jets','Newcastle Jets')
competition=c('Premier League','Premier League','A-League','A-League')
lookup=tibble(old_name,new_name,competition)
# A tibble: 4 × 3
old_name new_name competition
<chr> <chr> <chr>
1 Newcastle Newcastle United Premier League
2 Newcastle Utd Newcastle United Premier League
3 Newcastle Newcastle Jets A-League
4 Newcastle United Jets Newcastle Jets A-League
How can I recode/relabel team
such that only the relevant competition
from the lookup table is used? I tried combining dplyr's group_by
and recode
in different ways but no luck so far.
(My real data and lookup tables are much bigger and the data table includes cases that don't have a match in the lookup table.)
Desired output:
# A tibble: 6 × 2
team competition
<chr> <chr>
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets A-League
5 Newcastle Jets A-League
6 Newcastle Jets A-League
Share
Improve this question
asked Nov 16, 2024 at 21:53
mrroymrroy
1257 bronze badges
1
- 1 Perhaps try one of the methods from stackoverflow/questions/67081496/… ? – jared_mamrot Commented Nov 16, 2024 at 22:37
1 Answer
Reset to default 1An approach using full_join
library(dplyr)
full_join(df, lookup, by = join_by(team == old_name, competition)) %>%
mutate(team = coalesce(new_name, team), new_name = NULL)
# A tibble: 6 × 2
team competition
<chr> <chr>
1 Newcastle United Premier League
2 Newcastle United Premier League
3 Newcastle United Premier League
4 Newcastle Jets A-League
5 Newcastle Jets A-League
6 Newcastle Jets A-League
本文标签: dplyrgrouped recoding with lookup table in RStack Overflow
版权声明:本文标题:dplyr - grouped recoding with lookup table in R - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745642289a2667931.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论