Overcoming PyMongo’s bulk update handicap
Karthik (~karthik1) |
Description:
PyMongo faces challenges when executing bulk updates with large numbers of documents, resulting in performance degradation. To address this issue, we will talk about a solution that significantly enhances performance.
Approach The MongoDB documentation recommends performing bulk operations of up to 1000 records per request. If this number is more than 1000 MongoDB will automatically divide the operations into group of thousand or less and process the requests. We were trying to reduce the network connection to MongoDB and improve performance. However, we observed if we are performing bulk operations on >100000 documents, the performance degraded.
So in order to avoid this, we figured, instead of sending all the records at once, having a function that divides operations in to smaller chunks of 1000 records and then send it to MongoDB for performing these operations. Doing this change improves performance drastically.
While MongoDB version 3.6 raised the write limit from 1000 to 100,000, implementing our approach remains advantageous.
Conclusion Future bulk update endeavours should consider MongoDB's limitations and adapt solutions accordingly. Including such optimizations in database libraries by default for bulk write operations would be highly beneficial.
Prerequisites:
Good to have basic knowledge of python, pymongo and mongodb.
Video URL:
https://drive.google.com/file/d/16510HXDaL6Bn9-DrS6S7FFSz4tRo1Axm/view?usp=drive_link
Speaker Info:
My name is Karthik Nayak, and I currently work as a software developer at BNI India. With over six years of experience working with Python, I have a passion for exploring new technologies and building upon them.
Speaker Links:
https://www.linkedin.com/in/karthik-nayak-4419239b