Too Long; Didn't Read
There is a disconnect between machine learning being done in Jupyter notebooks on local machines and actually being served to end-users. Getting ML projects from research to production is hard. ML in production has a ways to go to catch up to the quality standards attained by more conventional software development. Data has to be treated with the same care that most developers give to code they write. Resources (compute, GPU, etc) are scattered and not being used efficiently across No proper CI/CD pipelines No proper monitoring in production (change in data quality etc.) Scaling is hard - in training and in serving Machine learning compute works in spikes, so systems need to be equipped to deal with that.