Child Abuse Detection Model

(인공지능융합대학 컴퓨터과학과) 서제인
10월 1일
3분 분량

최종 수정일: 10월 10일

Due to the recent normalization of dual-income households, children are spending increasingly longer hours in childcare facilities such as daycare centers. However, incidents of child abuse occurring within these facilities continue to surface and have become a major social issue.

In response to this societal context, we have chosen as our project theme the development of a motion recognition program for detecting physical abuse by substitute caregivers, with the goal of preventing such incidents before they occur.

As a follow-up measure to the “Incheon Daycare Child Abuse Case,” CCTV cameras were installed in 30,884 daycare centers nationwide. However, these systems only function as post-incident measures rather than preventive tools.

Research indicates that individuals who have experienced child abuse are 1.4 times more likely to commit suicide than those who have not. This finding highlights the societal need to move beyond reactive measures and adopt preventive actions capable of identifying abuse in real time.

Through a child abuse motion-recognition CCTV algorithm, this project aims to detect abusive behavior in real time and ensure the safety and protection of children.

Model Development and Analysis Summary

1. Limitations of the Existing Model

The team initially examined a violence detection model based on video segments.

Dataset: 1,000 non-violent and 1,000 violent video clips, each 5–7 seconds long (from Kaggle dataset)
Data Split: Train 72%, Validation 18%, Test 10%
Model Used: MobileNetV2 combined with LSTM for class probability prediction
Performance: The model achieved 90.5% accuracy on the test dataset

However, the model showed limitations in real-world applications due to low robustness and overfitting to the dataset.

2. Model 2 – Improved Approach

To enhance performance, a new architecture was designed using VGG16 features and logistic regression.

Feature Extraction: Used VGG16 up to the fc2 layer (fully connected layer)
Sequence Data Construction: Frames from each video were flattened and structured as sequential data
Classification: The generated sequence data were trained using a logistic regression model

This approach allowed temporal patterns to be considered while maintaining efficient computation.

3. Model 2 – Training and Testing

Experiments were conducted by varying the number of images per sequence (1, 3, 5, 7).

The VGG16-based model achieved the best test accuracy:
- 1 image: 93.94%
- 3 images: 93.94%
- 5 images: 93.33%
This represented a major improvement from 54% to 93.94% accuracy, indicating significantly better performance and stability.

4. Final Model Application Examples

The final trained model was tested on various video clips to demonstrate its practical application.

Scenario Examples:
- Detecting potential violent interactions in indoor environments (e.g., hallways, doorways).
- The model successfully detected frames where physical contact or aggressive behavior occurred.
- Non-violent interactions (e.g., helping gestures) were correctly classified as non-violent.
System Output Example:
- The program logs the timestamps of detected suspicious frames, listing frame indices where potential violence was found.
- For instance: “Detected suspicious actions in 39 frames” with corresponding frame numbers displayed.

5. Real-World Testing

The model was further evaluated on non-dataset videos, such as publicly available clips simulating real-world conditions. It successfully identified scenes involving possible child abuse or harmful contact, proving its adaptability to unseen data.

Summary

The improved model based on VGG16 feature extraction + Logistic Regression demonstrated high accuracy and robustness.
Compared to the existing MobileNetV2-LSTM model, it achieved greater generalization, better interpretability, and faster inference.
The system shows potential for real-time violence detection and could be applied in CCTV monitoring or child protection systems.