You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用Python登录网站遇Response [404]错误,新手求助排查方案

Troubleshooting Your Python Web Scraping Login Script (404 Error)

Hey there! Let's dig into why your login script is returning a 404 error. I've gone through your code and checked the target website, and here are the key issues and fixes to get you up and running:

1. You're POSTing to the Wrong URL

This is almost certainly the root cause of your 404. When you submit a login form in the browser, it doesn't always send data to the same URL you're viewing. To find the correct endpoint:

  • Open your browser's DevTools (F12)
  • Switch to the Network tab
  • Enter your credentials and click "Anmelden"
  • Look for the POST request in the list—its URL is where you need to send your form data, not the login page URL itself.

For this specific site, the login form's action attribute points to a different path (likely something like /fruchtgenuss/login_check), not /fruchtgenuss/login. Your script is sending data to the login page itself, which doesn't handle POST requests, hence the 404.

2. Fix Your Credential Formatting

You're using HTML entities (<username>) for your credentials—these need to be replaced with your actual username and password as plain text. For example:

form["nick"] = "my_actual_username"
form["password"] = "my_actual_password"

3. Adjust Headers for Better Browser Simulation

While your User-Agent is good, adding a Referer header tells the server you came from the login page, which helps mimic real browser behavior. You can also let requests handle the Content-Type header automatically (it will set it to application/x-www-form-urlencoded for form data).

Updated Working Script

Here's a revised version of your script that addresses all these issues:

import requests
from lxml import html

# Initialize session to persist cookies across requests
session_requests = requests.session()
login_url = "https://app.foodcoops.at/fruchtgenuss/login"

# Mimic real browser headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'Referer': login_url
}

# Fetch login page to extract form details and CSRF tokens
result = session_requests.get(login_url, headers=headers)
tree = html.fromstring(result.content)  # Use content instead of text for proper encoding handling

# Extract hidden form fields (critical for validating the request)
hidden_inputs = tree.xpath('//form//input[@type="hidden"]')
form_data = {input.attrib["name"]: input.attrib["value"] for input in hidden_inputs}

# Add your actual login credentials
form_data["nick"] = "your_actual_username"
form_data["password"] = "your_actual_password"
form_data["commit"] = "Anmelden"

# Get the correct form submission URL from the form's action attribute
form_action = tree.xpath('//form[@id="login-form"]/@action')[0]
# Handle relative URLs by combining with the base site URL
if not form_action.startswith('http'):
    form_action = f"https://app.foodcoops.at{form_action}"

# Submit the form to the correct endpoint
login_response = session_requests.post(form_action, data=form_data, headers=headers)

# Verify the result
print(f"Response Status Code: {login_response.status_code}")
print("\nFirst 500 characters of response:")
print(login_response.text[:500])

Key Notes:

  • Session Persistence: Using requests.session() ensures cookies (like the session cookie after successful login) are retained for subsequent requests to the site.
  • Dynamic Form Action: By extracting the form's action attribute, you avoid hardcoding the wrong URL and adapt to any future changes the site might make.
  • Encoding: Using result.content instead of result.text ensures you handle any character encoding correctly when parsing the HTML.

If you still run into issues, check the DevTools Network tab again to compare your script's request headers and form data against what the browser sends—sometimes sites require additional headers or have hidden fields you might have missed.

内容的提问来源于stack exchange,提问作者Gozy4

火山引擎 最新活动