Machine Learning Arch
What are the steps that the whole process must take to be successful?
- Connect to CoreDb2 (geo-replication)
- On a fixed schedule, run logic to prepare any and all csv files needed for machine learning logic.
- Store the files in a storage.
- Add the fact that a file was generated to a queue.
- Have a consumer of the queue (machine learning logic) that runs when a file is ready (train/re-train).
- When the training is done, save the model as a zip file and store it in storage.
- Expose the model using api.
Notes
It should be all serverless so that it can scale automatically when needed.
Processing pipeline
- (cron) Azure Function | azure logic app | storage queue
- Create event grid events
- (triggered) Azure function
- generate data set for each participant
- store data in blob storage
- creates event
- (triggered) Azure function
- train on new data
- generated trained model and store in blob storage
- api loads appropriate zipped ml model for prediction
At this point, azure api endpoint should be able to pick up the latest model and use that to serve predictions.
Event topics
We need event topics to register different publishers and consumers of events in that topic.
| Topic | Publisher | Consumer | Number of Events | End Product |
|---|---|---|---|---|
| INSIGHT-INIT-ML-PIPELINE-DEV | Azure Function (timer triggered) | Azure Function | Number of participants | training data for each ml model(blob storage) |
| INSIGHT-PROCESS-PARTICIPANT-DATA-DEV | Blob storage | Azure Function | Number of files added to blob storage | Zipped ML Model(blob storage) |