X Scraper 0_0

A comprehensive web application for scraping Twitter (X) data with an intuitive UI, database integration, and advanced scraping features.

Features

Comprehensive Scraping Options

Search Tweets: Find tweets containing specific keywords
Hashtag Tweets: Scrape tweets containing specific hashtags
User Tweets: Collect tweets from specific Twitter users, including:
Date Range Search: Search for tweets within a specific time period

Advanced Functionality

Job Management System: Track and monitor all scraping jobs
Database Integration: Save all scraped tweets to a MySQL database for permanent storage
Rate Limit Handling: Built-in rate limit tracking to avoid hitting Twitter API limits
Pagination Support: Automatically paginates through results to collect the requested number of tweets
Full Tweet Metadata: Captures comprehensive tweet data including:
- Reply counts
- Retweet counts
- Bookmark counts
- Hashtags
- Creation timestamps
- User information

User Interface

Modern UI: Clean, responsive interface built with Next.js and Tailwind CSS
Job Dashboard: View all scrape jobs and their results
Rate Limit Indicators: Visual indicators for API rate limits

Project Structure

twitter-scraper/
├── twitter-scraper-app/   # Next.js frontend application
│   ├── public/            # Static assets
│   ├── src/               # Application source code
│   │   ├── app/           # Pages and routes
│   │   │   ├── api/       # API routes
│   │   │   │   ├── jobs/  # Job management API
│   │   │   │   ├── scrape/ # Scraping API endpoints
│   │   │   ├── date-range/ # Date Range search page
│   │   │   ├── hashtag/   # Hashtag search page
│   │   │   ├── jobs/      # Jobs overview page
│   │   │   ├── search/    # General search page
│   │   │   ├── user/      # User tweets page
│   │   ├── components/    # React components
│   │   └── utils/         # Utility functions
├── initialize_db.py       # Database initialization script
├── scraper_api.py         # Python API bridge for frontend
├── db_interface.py        # Database interface functions
├── tweet_scraper_service.py # Core Twitter scraping logic
└── .env                   # Environment variables

Installation

Prerequisites

Python 3.8 or higher
Node.js 16.x or higher
MySQL database

Step 1: Clone the Repository

git clone <repository-url>
cd twitter-scraper

Step 2: Install Python Dependencies

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

# Install dependencies
pip install mysql-connector-python python-dotenv twikit

Step 3: Install JavaScript Dependencies

cd twitter-scraper-app
npm install

Step 4: Configure Environment Variables

Create a .env file in the root directory:

# Twitter Credentials
TWITTER_USERNAME=your_username
TWITTER_EMAIL=your_email
TWITTER_PASSWORD=your_password

# Database Configuration
DB_HOST=localhost
DB_USER=root
DB_PASSWORD=your_db_password

Step 5: Initialize the Database

python initialize_db.py

Step 6: Start the Application

cd twitter-scraper-app
npm run dev

The application will be available at http://localhost:3000

Usage

Search for Tweets by Keyword

Navigate to the "Search" page
Enter your search query
Select search type (Latest, Top, or Media)
Choose the number of tweets to retrieve (1-100)
Click "Start Scraping"

Search for Hashtag Tweets

Navigate to the "Hashtags" page
Enter the hashtag (without the # symbol)
Select search type (Latest or Top)
Choose the number of tweets to retrieve
Click "Start Scraping"

Scrape User Tweets

Navigate to the "User Tweets" page
Enter the username (without the @ symbol)
Select tweet type (Tweets, Replies, Media, or Likes)
Choose the number of tweets to retrieve
Click "Start Scraping"

Search by Date Range

Navigate to the "Date Range" page
Enter your search query
Select start and end dates
Choose the number of tweets to retrieve
Click "Start Scraping"

View Scraping Jobs

Navigate to the "Jobs" page
Browse the list of all scraping jobs
Click "View Details" to see job details and scraped tweets

API Reference

Scrape API

POST /api/scrape

Initiates a scraping job.

Request Body:

{
  "type": "SEARCH_TWEETS",
  "params": {
    "query": "example search",
    "searchType": "Latest",
    "count": 30
  }
}

Types:

SEARCH_TWEETS: General search
HASHTAG_TOP_TWEETS: Hashtag search (top tweets)
HASHTAG_LATEST_TWEETS: Hashtag search (latest tweets)
USER_TWEETS: User tweets
DATE_RANGE_TWEETS: Date range search

Response:

{
  "success": true,
  "result": {
    "jobId": 123,
    "tweetCount": 30
  },
  "rateLimitInfo": {
    "endpoint": "SearchTimeline",
    "limit": 50,
    "resetMinutes": 15
  }
}

Jobs API

GET /api/jobs

Gets all jobs or a specific job's details.

Query Parameters:

jobId (optional): Get details for a specific job

Response for all jobs:

{
  "success": true,
  "jobs": [
    {
      "job_id": 123,
      "job_type": "SEARCH_TWEETS",
      "query": "example",
      "parameters": {},
      "start_time": "2023-07-10T12:00:00Z",
      "end_time": "2023-07-10T12:01:30Z",
      "status": "COMPLETED",
      "tweet_count": 30,
      "created_at": "2023-07-10T12:00:00Z"
    }
  ]
}

Response for specific job:

{
  "success": true,
  "job": {
    "job_id": 123,
    "job_type": "SEARCH_TWEETS",
    "query": "example",
    "parameters": {},
    "start_time": "2023-07-10T12:00:00Z",
    "end_time": "2023-07-10T12:01:30Z",
    "status": "COMPLETED",
    "tweet_count": 30,
    "created_at": "2023-07-10T12:00:00Z"
  },
  "tweets": [
    {
      "id": "tweet_id",
      "user_name": "username",
      "user_id": "user_id",
      "text": "Tweet content",
      "created_at": "2023-07-09T10:00:00Z",
      "reply_count": 5,
      "retweet_count": 10,
      "bookmark_count": 2,
      "hashtags": ["example", "tweet"]
    }
  ]
}

Rate Limit Management

The application implements sophisticated rate limit tracking to prevent hitting Twitter API limits:

Automatic Tracking: Records API usage in localStorage
Visual Indicators: Shows remaining requests and time until reset
Form Disabling: Automatically disables forms when rate limits are reached
Reset Countdown: Displays countdown timer until rate limits reset

Twitter API Rate Limits

Function	Endpoint	Limit (per 15 min)
Search Tweets	SearchTimeline	50
Get User Tweets	UserTweets	50
Get User Replies	UserTweetsAndReplies	50
Get User Media	UserMedia	500
Get User Likes	Likes	500

Database Schema

Scraping Jobs Table

CREATE TABLE scraping_jobs (
    job_id INT AUTO_INCREMENT PRIMARY KEY,
    job_type VARCHAR(50) NOT NULL,
    query VARCHAR(255) NOT NULL,
    parameters JSON,
    start_time DATETIME NOT NULL,
    end_time DATETIME,
    status VARCHAR(20) NOT NULL,
    tweet_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)

Tweets Table

CREATE TABLE tweets (
    id VARCHAR(255) PRIMARY KEY,
    job_id INT,
    user_name VARCHAR(255),
    user_id VARCHAR(255),
    text TEXT,
    created_at DATETIME,
    reply_count INT DEFAULT 0,
    retweet_count INT DEFAULT 0,
    bookmark_count INT DEFAULT 0,
    hashtags JSON,
    raw_data JSON,
    indexed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (job_id) REFERENCES scraping_jobs(job_id)
)

Technology Stack

Frontend

Next.js: React framework for the UI
Tailwind CSS: Utility-first CSS framework
Axios: HTTP client for API requests
React DatePicker: For date selection
TypeScript: For type safety

Backend

Python: Core scraping functionality
twikit: Twitter scraping library
MySQL: Database for storing tweets and jobs
mysql-connector-python: Database connection
python-dotenv: Environment variables

Troubleshooting

Authentication Issues

If you encounter authentication errors:

Check your Twitter credentials in the .env file
Delete the cookies.json file (if it exists) to force re-authentication
Ensure your Twitter account is not locked or requiring additional verification

Database Connection Issues

If database connection fails:

Verify MySQL is running
Check database credentials in the .env file
Run initialize_db.py again to create the database and tables

Rate Limit Errors

If hitting rate limits:

Wait for the rate limit to reset (15 minutes)
Reduce the number of requests by lowering the tweet count
Space out your scraping jobs

Installation Problems

Common installation issues:

MySQL Connector Error: Ensure you have the proper MySQL development libraries installed

# Ubuntu/Debian
sudo apt-get install python3-dev default-libmysqlclient-dev build-essential
# macOS
brew install mysql-client

Node.js Errors: Make sure you're using a compatible Node.js version (16.x or higher)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

For any questions or support, please open an issue in the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
twitter-scraper-app		twitter-scraper-app
.gitignore		.gitignore
CookieSetup.py		CookieSetup.py
LICENSE		LICENSE
README.md		README.md
db_interface.py		db_interface.py
initialize_db.py		initialize_db.py
scraper_api.py		scraper_api.py
tweet_scraper_service.py		tweet_scraper_service.py

Folders and files

Latest commit

History

Repository files navigation

X Scraper 0_0

Table of Contents

Features

Comprehensive Scraping Options

Advanced Functionality

User Interface

Project Structure

Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Python Dependencies

Step 3: Install JavaScript Dependencies

Step 4: Configure Environment Variables

Step 5: Initialize the Database

Step 6: Start the Application

Usage

Search for Tweets by Keyword

Search for Hashtag Tweets

Scrape User Tweets

Search by Date Range

View Scraping Jobs

API Reference

Scrape API

POST /api/scrape

Jobs API

GET /api/jobs

Rate Limit Management

Twitter API Rate Limits

Database Schema

Scraping Jobs Table

Tweets Table

Technology Stack

Frontend

Backend

Troubleshooting

Authentication Issues

Database Connection Issues

Rate Limit Errors

Installation Problems

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages