admin管理员组

文章数量:1435859

I have an 18,000 record dataset in the below format:

Date Tm Site Opp Player Dist Made Blocked GameID Season
2024-01-07 ARI H SEA Matt Prater 51 N N SEA @ ARI 2023
2024-01-07 DAL A WAS Brandon Aubrey 50 Y N DAL @ WAS 2023
2024-01-07 TAM A CAR Chase McLaughlin 57 Y N TAM @ CAR 2023
2024-01-07 CAR H TAM Matthew Wright 52 N N TAM @ CAR 2023
2024-01-07 CHI A GNB Cairo Santos 50 Y N CHI @ GNB 2023

I have an 18,000 record dataset in the below format:

Date Tm Site Opp Player Dist Made Blocked GameID Season
2024-01-07 ARI H SEA Matt Prater 51 N N SEA @ ARI 2023
2024-01-07 DAL A WAS Brandon Aubrey 50 Y N DAL @ WAS 2023
2024-01-07 TAM A CAR Chase McLaughlin 57 Y N TAM @ CAR 2023
2024-01-07 CAR H TAM Matthew Wright 52 N N TAM @ CAR 2023
2024-01-07 CHI A GNB Cairo Santos 50 Y N CHI @ GNB 2023

There is data for 50 seasons. My goal for this part of my project is to calculate the number of attempts (each line is one attempt) per game (unique GameID) by season. My thought was the best route is to create a dataframe that has columns for season, attempts, games, and average per game.

I've run a calculation for attempts by using:

df.groupby(['Season']).size()

And unique games by using:

df.groupby('Season')['GameID'].nunique()

Each of these brings back a table by year, so I was thinking that I could create a dictionary with the three fields to build a new dataframe.

data = {"Year":df.groupby(['Season']), "FG":df.groupby(['Season']).size(), "Games":df.groupby('Season')['GameID'].nunique()}
dfgrp = pd.DataFrame(data)

But I get a very long error when I try to view dfgrp, where it stops iteration but doesn't identify what the issue is.

I've tried looking through multiple searches but there doesn't seem to be a matching question that addresses this issue. Am I going about this the wrong way?

Share Improve this question asked Nov 15, 2024 at 21:39 AbartelAbartel 274 bronze badges 2
  • 1 Not sure exactly what you are trying to do. Something like this? out = df.groupby(['Season'], as_index=False).agg(FG=('Season', 'size'), Games=('GameID', 'nunique')) if you add what you expect the output to be, it would be easier to help – iBeMeltin Commented Nov 15, 2024 at 21:55
  • I am also not sure what you want to do. Why don't you groupby(by=['Season', 'GameID']).nunique()? From your post I assume that you need information about each GameID in the season, which are lost using .agg(). – yellow_dot Commented Nov 16, 2024 at 15:33
Add a comment  | 

1 Answer 1

Reset to default 0

You could skip a few steps with pd.groupby.agg().

df.groupby('Season').agg(size=('Season', 'size'),
                         nunique=('GameID', 'nunique'))

        size    nunique
Season      
  2023     5          4

本文标签: