如何在特定日期自动触发Python脚本实现月度DataFrame更新
Alright, let's break this down into two key parts: creating the Python script that adds the new monthly data, and setting up the automatic trigger to run it on the 6th of every month.
Step 1: Write the Python Script
First, we need a script that loads your existing DataFrame, calculates the next month's details, appends the new row, and saves the updated data. This script handles date logic automatically, so you don't have to manually input month names or dates.
Here's the complete script (save it as update_monthly_data.py):
import pandas as pd from datetime import datetime, timedelta import calendar def add_next_month_entry(): # Load existing data (or create empty df if file doesn't exist) try: df = pd.read_csv('monthly_data.csv') # Convert date columns to datetime objects for calculations df['Start_Date'] = pd.to_datetime(df['Start_Date'], format='%m/%d/%Y') df['End_Date'] = pd.to_datetime(df['End_Date'], format='%m/%d/%Y') df['Updated_on'] = pd.to_datetime(df['Updated_on'], format='%m/%d/%Y') except FileNotFoundError: # Initialize empty df if starting fresh (matches your example structure) df = pd.DataFrame(columns=['Month', 'Product', 'Start_Date', 'End_Date', 'Updated_on']) if not df.empty: # Get the last entry's end date to find the next month's start last_end_date = df['End_Date'].iloc[-1] next_start_date = last_end_date + timedelta(days=1) else: # If no data exists yet, start with January 2020 (as per your example) next_start_date = datetime(2020, 1, 1) # Extract year and month for the new entry new_year, new_month = next_start_date.year, next_start_date.month # Get full month name (e.g., "June") month_name = next_start_date.strftime('%B') # Calculate last day of the new month last_day = calendar.monthrange(new_year, new_month)[1] next_end_date = datetime(new_year, new_month, last_day) # Calculate Updated_on: 6th day of the following month if new_month == 12: updated_year, updated_month = new_year + 1, 1 else: updated_year, updated_month = new_year, new_month + 1 updated_on = datetime(updated_year, updated_month, 6) # Create the new row new_row = pd.DataFrame({ 'Month': [month_name], 'Product': ['Beverage'], # Adjust this if your product varies 'Start_Date': [next_start_date.strftime('%m/%d/%Y')], 'End_Date': [next_end_date.strftime('%m/%d/%Y')], 'Updated_on': [updated_on.strftime('%m/%d/%Y')] }) # Append and save the updated DataFrame df = pd.concat([df, new_row], ignore_index=True) # Convert dates back to your original string format df['Start_Date'] = df['Start_Date'].dt.strftime('%m/%d/%Y') df['End_Date'] = df['End_Date'].dt.strftime('%m/%d/%Y') df['Updated_on'] = df['Updated_on'].dt.strftime('%m/%d/%Y') df.to_csv('monthly_data.csv', index=False) print(f"Successfully added entry for {month_name} {new_year}") if __name__ == "__main__": add_next_month_entry()
Notes on the Script:
- It loads your existing
monthly_data.csvfile (place this in the same folder as the script, or use an absolute path like'/home/you/data/monthly_data.csv'). - If the file doesn't exist, it starts fresh with January 2020 (matching your example).
- It automatically calculates the next month's start/end dates and the
Updated_onvalue (6th of the following month). - It saves the updated data back to the CSV file, overwriting the original (keep a backup if needed!).
Step 2: Set Up Automatic Trigger
Now, we need to make this script run on the 6th of every month. The method depends on your operating system:
For Windows: Use Task Scheduler
- Open Task Scheduler (search for it in the Start Menu).
- Click Create Basic Task in the right-hand pane.
- Name the task (e.g., "Monthly Beverage Data Update") and add a description, then click Next.
- Select Monthly as the trigger, click Next.
- Set it to run on the 6th day of every month, at your preferred time (e.g., 9:00 AM), then click Next.
- Choose Start a program as the action, click Next.
- In Program/script, browse to your Python executable (e.g.,
C:\Python311\python.exe). To find this, open Command Prompt and runwhere python. - In Add arguments, enter the full path to your script (e.g.,
"C:\Users\You\Scripts\update_monthly_data.py"). - In Start in, enter the folder where your script and CSV file are located (e.g.,
"C:\Users\You\Scripts"), then click Next. - Check Open the Properties dialog for this task when I click Finish, then click Finish.
- In the Properties window, go to the Settings tab and check Run whether user is logged on or not (so it runs even if you're not signed in). Click OK and enter your Windows password when prompted.
For macOS/Linux: Use Cron
- Open Terminal.
- Run
crontab -eto edit your cron jobs (this will open a text editor, usually nano). - Add this line to the file (replace paths with your actual ones):
0 9 6 * * /usr/bin/python3 /home/you/scripts/update_monthly_data.py- Breakdown:
0: Run at the 0th minute (top of the hour)9: Run at 9 AM6: Run on the 6th day of the month*: Every month*: Every day of the week/usr/bin/python3: Path to your Python 3 executable (find this withwhich python3)/home/you/scripts/update_monthly_data.py: Full path to your script
- Breakdown:
- Save and exit: In nano, press
Ctrl+O, hit Enter, thenCtrl+X. - Verify the cron job is set up by running
crontab -l.
Important Tips:
- Test the script first: Run it manually once (
python update_monthly_data.py) to make sure it adds the correct row to your CSV. - Use absolute paths: To avoid issues with working directories, use full paths for the CSV file in the script and for the script path in the scheduler/cron.
- Check permissions: On macOS/Linux, ensure the script has execute permissions (run
chmod +x /path/to/update_monthly_data.pyif needed).
That's it! Your script will now automatically run on the 6th of every month and add the new month's data to your DataFrame.
内容的提问来源于stack exchange,提问作者harsh




