This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Python-based data analysis project for KUNI 2thecore that provides data analysis reports for car rental companies. The project analyzes rental car usage patterns, brand preferences, seasonal trends, and location optimization using MySQL database connectivity and advanced data analysis capabilities.
The system provides data-driven insights to rental car companies using two main data sources:
- car table: Vehicle information including model, brand, year, type, status, and location
- drivelog table: Driving records with trip details, timestamps, locations, and events
- Seasonal Brand/Vehicle Preference Analysis - Monthly and seasonal analysis of preferred brands and vehicles with seasonality evaluation
- Trend Analysis - Year-over-year changes in brand and vehicle preferences
- Daily Vehicle Usage Forecasting - Daily operational vehicle count analysis with SARIMA time series forecasting (1 week to 1 month ahead)
- Location Clustering Analysis - Regional importance quantification for optimal rental location placement
The project requires a virtual environment with dependencies installed from requirements.txt:
# Install dependencies
pip install -r requirements.txt
# Verify setup
python verify_setup.pyThe project uses MySQL with SQLAlchemy and requires a .env file with database credentials:
- DB_HOST
- DB_USER
- DB_PASSWORD
- DB_NAME
- DB_PORT
Database connection and data loading functionality is centralized in src/data_loader.py.
car_id(Primary Key): Unique vehicle identifiermodel: Vehicle model name (e.g., K3, 투싼, 아이오닉 5)brand: Manufacturer (기아, 현대, 제네시스)status: Current status (IDLE, MAINTENANCE)car_year: Manufacturing yearcar_type: Vehicle category (소형, 중형, 대형, SUV)car_number: License plate numbersum_dist: Total distance traveled (km)login_id: Associated user/company IDlast_latitude,last_longitude: Current GPS coordinates
drive_log_id(Primary Key): Unique trip identifiercar_id(Foreign Key): References car tabledrive_dist: Trip distance (km)start_point,end_point: Location namesstart_latitude,start_longitude: Trip origin coordinatesend_latitude,end_longitude: Trip destination coordinatesstart_time,end_time: Trip timestampscreated_at: Record creation timestampmodel,brand: Denormalized vehicle informationmemo: Trip notes/events (급감속, 과열 경고, etc.)status: Trip status
app.py- Main Flask application with REST API endpoints and Swagger documentationrun_server.py- Production server runner scriptsrc/data_loader.py- Database connection and data loading utilitiesget_db_connection()- Creates SQLAlchemy engine from .env credentialsget_data_from_db(query)- Executes SQL queries and returns pandas DataFrames
verify_setup.py- Environment verification script for testing library imports
src/simple_preference_analysis.py- Seasonal brand/vehicle preference analysis with sklearnsrc/simple_trend_analysis.py- Year-over-year trend analysissrc/services/daily_forecast.py- Daily usage forecasting with SARIMA time series modelssrc/services/region_clustering.py- Geographic clustering analysissrc/utils/cache.py- Result caching utilities for performance optimization
src/data_quality.py- Data validation and quality checkssrc/visualization_enhanced.py- Advanced visualization utilitiessrc/statistical_tests.py- Statistical analysis toolssrc/utils/font_config.py- Cross-platform Korean font configuration for matplotlibrequirements.txt- Full data science stack dependenciesubuntu_setup.sh- Ubuntu server deployment setup script
Common development commands for this data analysis project:
# Verify environment setup (import test)
python verify_setup.py
# Test database connection and data loading
python src/data_loader.py
# Start Flask web server (development)
python app.py
# or (preferred production runner)
python run_server.py
# Start Jupyter for interactive analysis
jupyter lab
# Test individual analysis modules
python -m src.simple_preference_analysis
python -m src.simple_trend_analysis
python -m src.services.daily_forecast
python -m src.services.region_clusteringThe project includes a Flask-based REST API server for data analysis:
- Main Application:
app.py- Core Flask application with REST API endpoints - Server Runner:
run_server.py- Development server startup script - Base URL:
http://localhost:5000
Core System Endpoints:
GET /- API information and available endpointsPOST /api/data- Execute SQL queries and return results as JSONGET /api/health- Health check and database connectivity status
Data Analysis Endpoints:
GET /api/analysis/period- Monthly/seasonal brand and vehicle preference analysis (implemented insrc/simple_preference_analysis.py)GET /api/analysis/trend- Year-over-year brand/vehicle preference trends (implemented insrc/simple_trend_analysis.py)GET /api/forecast/daily- Daily vehicle usage counts with forecasting (implemented insrc/services/daily_forecast.py)GET /api/clustering/regions- Regional importance analysis for location optimization (implemented insrc/services/region_clustering.py)
Note: All analysis endpoints use Swagger documentation via flasgger. Access interactive API docs at /apidocs/ when server is running.
# Health check
curl http://localhost:5000/api/health
# Execute a custom SQL query
curl -X POST http://localhost:5000/api/data \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM car LIMIT 5;"}'
# Get seasonal preference analysis
curl http://localhost:5000/api/analysis/period
# Get trend analysis
curl http://localhost:5000/api/analysis/trend
# Get daily usage forecast
curl http://localhost:5000/api/forecast/daily
# Access Swagger documentation
# Open browser to: http://localhost:5000/apidocs/Purpose: Analyze monthly and seasonal patterns in brand and vehicle preferences
- Calculate preference ratios by month/season for each brand (기아, 현대, 제네시스)
- Evaluate seasonality strength using statistical measures
- Identify peak seasons for specific vehicle types (소형, 중형, 대형, SUV)
- Generate seasonal preference rankings and trends
Key Metrics:
- Monthly/seasonal rental frequency by brand and model
- Seasonality index calculation
- Brand market share variations across seasons
- Vehicle type preference patterns
Purpose: Track long-term changes in brand and vehicle preferences over multiple years
- Compare brand preference shifts year-over-year
- Identify emerging trends in vehicle type preferences
- Analyze market share evolution for each manufacturer
- Detect preference pattern changes over time
Key Metrics:
- Annual growth rates by brand
- Market share trend analysis
- Vehicle type adoption patterns
- Preference volatility measurements
Purpose: Provide operational insights and future demand prediction
- Analyze daily operational vehicle counts over specified periods
- Generate visual graphs of usage patterns
- Implement SARIMA time series forecasting for 1-week to 1-month ahead
- Create interactive dashboards with forecast visualizations
Key Features:
- Real-time daily usage tracking
- Seasonal adjustment in forecasting models
- Confidence intervals for predictions
- Visual trend analysis with matplotlib/seaborn
Purpose: Optimize rental location placement through data-driven regional analysis
- Perform clustering analysis on trip start/end coordinates
- Quantify regional importance based on usage density
- Identify optimal locations for new rental stations
- Analyze geographical usage patterns and accessibility
Key Techniques:
- K-means clustering on geographical coordinates
- Regional demand density analysis
- Distance-based accessibility scoring
- Heatmap visualizations for location insights
The project includes a full data science stack:
- Database: SQLAlchemy, mysql-connector-python, python-dotenv
- Data Analysis: pandas, numpy, scipy, statsmodels
- Machine Learning: scikit-learn (clustering, regression, preprocessing)
- Visualization: matplotlib, seaborn (base64 encoding for API responses)
- Web Server: Flask, Flask-CORS, Flask-RESTful, flasgger (Swagger), Werkzeug
- Jupyter: jupyterlab, ipykernel for interactive development
- Development: All supporting libraries for data analysis workflows
- Cross-Platform Korean Font Support: matplotlib configured with Korean fonts (Malgun Gothic for Windows, Noto Sans CJK KR for Linux, Apple SD Gothic Neo for macOS)
- Result Caching: Analysis endpoints use
@cache_resultdecorator for performance - Base64 Visualizations: Charts returned as base64 strings for API consumption
- Error Handling: Comprehensive error handling with structured JSON responses
- Swagger Documentation: All endpoints documented and accessible at
/apidocs/ - Database Requirements: Requires
.envfile with MySQL connection parameters
For Ubuntu server deployment, use the provided setup script:
# Make setup script executable
chmod +x ubuntu_setup.sh
# Run Ubuntu setup script
./ubuntu_setup.shThe setup script will install:
- Korean Fonts: fonts-noto-cjk, fonts-nanum, fonts-liberation
- System Packages: python3, python3-pip, python3-venv, build-essential
- MySQL Client: libmysqlclient-dev for database connectivity
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python and dependencies
sudo apt install -y python3 python3-pip python3-venv python3-dev
sudo apt install -y build-essential libmysqlclient-dev pkg-config
# Install Korean fonts (Critical for chart generation)
sudo apt install -y fonts-noto-cjk fonts-nanum fonts-liberation
sudo fc-cache -fv
# Verify Korean fonts are installed
fc-list :lang=ko
# Create virtual environment and install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Create .env file with database credentials
# DB_HOST, DB_USER, DB_PASSWORD, DB_NAME, DB_PORT
# Run server
python run_server.pyFor production servers, use gunicorn:
# Install gunicorn
pip install gunicorn
# Run production server
gunicorn -w 4 -b 0.0.0.0:5000 app:app
# Or with supervisor for process management
sudo apt install supervisor
# Configure supervisor with provided configImportant: Korean font installation is critical for proper chart generation. Without Korean fonts, chart text will appear as squares or question marks.