Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

1
Multithreading Using concurrent.futures Module Not Working
Post Body

Hello friends. I have to test 1200 proxies to see if they are still active or not. It works fine when one proxy is tested at a time. But it takes a long time to complete testing all 1200 proxies. So to speed things up, I decided to use the python's inbuilt concurrent.futures module.

Below is the code for reference. I want the final output to be saved in a variable so that I can print all the results or even write it to a csv file. But I think I am not doing something right. I tried some google search and even watched videos but unable to troubleshoot the problem. This is the first time I am trying something like this. Kindly guide me.

Is there a better or more elegant way to perform this task assuming the URL is different for reach request?

import requests
import pandas as pd
import time
import concurrent.futures as cf
import random

MAX_THREADS = 30

df = pd.read_csv(
    "proxy_file.csv",
    header=0,
    sep=','
)

# This is a fixed url and will not change. I just provided one for reference
URL = "https://google.com/"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'
}

def test_proxy(url, df):
    status = []

    for index, row in df.iterrows():
        # Setting up the proxy
        proxies = {
            'http': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}",
            'https': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}"
        }

        with requests.Session() as session:
            try:
                print(f"testing proxy: {proxies}")
                site = session.get(URL, headers=headers, proxies=proxies)
                status.append(site.status_code)
            except Exception as e:
                status.append("Error")

        # Adding a random time variable between two requests to avoid creating a pattern
        time.sleep(random.randint(0, 5))
    return status


threads = min(MAX_THREADS, df.shape[0])

with cf.ThreadPoolExecutor(max_workers=threads) as executor:
    results = executor.map(test_proxy, URL)

print(results)

Author
User Disabled
Account Strength
0%
Disabled 4 months ago
Account Age
5 years
Verified Email
Yes
Verified Flair
No
Total Karma
6,531
Link Karma
663
Comment Karma
5,737
Profile updated: 4 days ago
Posts updated: 1 year ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
4 years ago