In agricultural industry, autonomous robots have been adopted to reduce labor-intensive tasks. In a context of traditional chili fruit harvesting, the process often requires a significant workforce and can be improved especially for grading due to human eyes being prone to errors. In addition, the characteristics of chili fruits is significantly different with other typed of fruits due to its sizes, variations, texture and its localization from the plant. To address this, an investigation was conducted using the You Only Look Once (YOLO) object detection algorithm to localize and classify chili fruits variations. A dataset consisting of 300 chili fruit images, each with a resolution of 640 x 640 pixels was utilized. Among these images, 270 were allocated for training the model, and the remaining 30 were used for testing. By leveraging a Convolutional Neural Network (CNN) model in the YOLO architecture, the algorithm successfully classified chili fruits into three categories; green chili, red chili, and rotten chili, according to its color and texture. YOLO model achieved a detection and classification accuracy above 93%. In a context of implementing agricultural robots, localization estimation using a monocular camera was necessary. This is crucial for the effective implementation of agricultural robots in chili harvesting and grading processes.