Basketball Experiment

Hi all,

I’m a bit of a sports nut and completing my final year in a computer science degree. I have a small project around machine learning and, while I understand some of the concepts, I struggle with implementation.

My Data:

  • All NBA Basketball games for the last 3 years
    • Results, etc
  • Player statistics - height, etc
    • Statistics per game
  • Speciality stats
    • Was travel involved between games (Distance)

What I don’t understand is how to structure this. I want to build a model that can give the team and available players to and it be able to gauge the outcome. I have been looking into linear regression for this.

How do I uniquely identify a team? At the moment I have coded them by a unique identifier (eg. LAL = Lakers). Does the model learn that the statistics for LAL are unique for their own performances? How do I structure the data / model so the model know “When LAL plays it is informed by LAL historic performances”.

Also - how does a model go about weighting performances … for example, a teams result in the last 5 games is more important than what happened 4 months ago. ?

Fascinating concepts. Loving it and really appreciative of anyone who is prepared to give me a bit of their time to help understand this. Any notebooks with similar concepts would be greatly appreciated.