Accurate 3D object model extraction is essential for a wide range of robotics applications, including grasping and object mapping, which require precise knowledge of objects' shape and location to perform optimally. However, high accuracy can be challenging to achieve, particularly when working with real-world data where factors like occlusions, clutter and noise can greatly influence results. Several techniques can be found in literature for integrating 2D deep learning and point cloud segmentation. Nevertheless, comparative studies on these algorithms are very limited. In contrast, this paper evaluates methods for obtaining 3D object models using a combination of deep learning object detection and point cloud segmentation. We compare a number of existing techniques, some of which have been improved for performance, on real-world data. More specifically, the paper examines four methods for 3D object extraction: two for bounding box object detection, one for instance segmentation and a fourth method that involves estimating an object mask in the image inside the bounding box. We compare these techniques qualitatively and quantitatively using several criteria, providing insights into their strengths and limitations.