This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hello friends. I have to test 1200 proxies to see if they are still active or not. It works fine when one proxy is tested at a time. But it takes a long time to complete testing all 1200 proxies. So to speed things up, I decided to use the python's inbuilt concurrent.futures module.
Below is the code for reference. I want the final output to be saved in a variable so that I can print all the results or even write it to a csv file. But I think I am not doing something right. I tried some google search and even watched videos but unable to troubleshoot the problem. This is the first time I am trying something like this. Kindly guide me.
Is there a better or more elegant way to perform this task assuming the URL is different for reach request?
import requests
import pandas as pd
import time
import concurrent.futures as cf
import random
MAX_THREADS = 30
df = pd.read_csv(
"proxy_file.csv",
header=0,
sep=','
)
# This is a fixed url and will not change. I just provided one for reference
URL = "https://google.com/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'
}
def test_proxy(url, df):
status = []
for index, row in df.iterrows():
# Setting up the proxy
proxies = {
'http': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}",
'https': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}"
}
with requests.Session() as session:
try:
print(f"testing proxy: {proxies}")
site = session.get(URL, headers=headers, proxies=proxies)
status.append(site.status_code)
except Exception as e:
status.append("Error")
# Adding a random time variable between two requests to avoid creating a pattern
time.sleep(random.randint(0, 5))
return status
threads = min(MAX_THREADS, df.shape[0])
with cf.ThreadPoolExecutor(max_workers=threads) as executor:
results = executor.map(test_proxy, URL)
print(results)
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/learnpython...