Multithreading Using concurrent.futures Module Not Working

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Hello friends. I have to test 1200 proxies to see if they are still active or not. It works fine when one proxy is tested at a time. But it takes a long time to complete testing all 1200 proxies. So to speed things up, I decided to use the python's inbuilt concurrent.futures module.

Below is the code for reference. I want the final output to be saved in a variable so that I can print all the results or even write it to a csv file. But I think I am not doing something right. I tried some google search and even watched videos but unable to troubleshoot the problem. This is the first time I am trying something like this. Kindly guide me.

Is there a better or more elegant way to perform this task assuming the URL is different for reach request?

import requests
import pandas as pd
import time
import concurrent.futures as cf
import random

MAX_THREADS = 30

df = pd.read_csv(
    "proxy_file.csv",
    header=0,
    sep=','
)

# This is a fixed url and will not change. I just provided one for reference
URL = "https://google.com/"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'
}

def test_proxy(url, df):
    status = []

    for index, row in df.iterrows():
        # Setting up the proxy
        proxies = {
            'http': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}",
            'https': f"http://{row['username']}:{row['password']}@{row['ip']}:{row['port']}"
        }

        with requests.Session() as session:
            try:
                print(f"testing proxy: {proxies}")
                site = session.get(URL, headers=headers, proxies=proxies)
                status.append(site.status_code)
            except Exception as e:
                status.append("Error")

        # Adding a random time variable between two requests to avoid creating a pattern
        time.sleep(random.randint(0, 5))
    return status


threads = min(MAX_THREADS, df.shape[0])

with cf.ThreadPoolExecutor(max_workers=threads) as executor:
    results = executor.map(test_proxy, URL)

print(results)

Author

User Disabled

Account Strength

Disabled 4 months ago

Account Age

5 years

Verified Email

Yes

Verified Flair

Total Karma

6,531

Link Karma

663

Comment Karma

5,737

Profile updated: 4 days ago

Posts updated: 1 year ago

unknownguy0518

Subreddit

r/learnpython

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 4 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/learnpython...