This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hey guys!
I have posted here a couple of times in the past, most notably about 6 Months ago was my last post, and most of these posts were about whether or not there were any computer vision tools that could help the blind play video games.
I grew up loving video games when I was younger, and I have loved them for my entire life. They were a sense of escape I think, since I always knew I was inevitably going to lose my eyesight some time in life, I feel like I probably stayed at home quite often playign online games with friends just to experience all that fun I could before I lost my vision.
Well unfortunately that time came in 2021, and I lost the last remainder of my vision due to my genetic eye disability finally. This was real tough at first, but as I learned how to use my screen reader, and get used to using some of the tools that are available to blind folks, I got to a point where I was not only proficient with a computer, but I feel like I'm honestly faster at using it than I was sighted.
The one sore spot though, was that I could no longer play video games with friends really, since most of the games that were accessible for blind users were audio games, and those aren't really that fun for sighted players.
Well, a user over on the audio games forums just a week ago or so, created this tool which allows NVDA, the free screen reader coded in Python, to utilize the GPT-4 Vision API to take a snapshot and describe either an item that's focused, your entire screen, or something copied to your clipboard. I was absolutely ecstatic when I heard this, and I had to give it a shot for navigation assistance, and let me tell you, I am blown away.
I used the prompt, "You are guiding a blind user through a video game environment. If an interface is open, describe the highlighted item. If no interface is open, guide the player to key points of interest and NPCs." and, while sceptical at first, I gave this a shot, and let's just say I was very moved by this experience.
Not only did it work flawlessly at describing my interface, but it was able to actually guide me out of a building and to an NPC with zero issues at all, just like a sighteed friend would do to help me.
Just wow.
Something like this wasn't even possible just 6 months ago when I posted that thread here, and now it's possible! I am just so grateful to everyone here who contributes to this field, and wanted to share my excitement.
With that being said, I also had a couple of questions for all of you! Most notably, the only two things that are currently holding the tool back from being used by me on a daily basis are the price, and the computation time. It seems like each time I use the addon, which is called AI Content Describer for NVDA, it costs me around $0.01 per time I use it. I was a little surprised by this, since I thought that you could use some kind of a low quality mode in orer to make the scan cheaper, but I think perhaps the plugin might be a little broken at the moment or something, and this feature might not be working, is my guess.
The second thing is the computation time. Currently, it will take anywhere from 7-15 seconds on average for me to get a snapshot, which is a little unfortuante since it means that realtime games are a little more rough, so I was curious if there was probably some sort of way I could get around either of these hiccups?
If those two things were fixed, I would seriously be so happy I could cry. Which might sound strange, I know, but this was a hobby that was stolen from me that feels like it's genuinely within my grasp again, and I can't put into words how happy that makes me.
ANyhow, if you have read this far, thank you!
Subreddit
Post Details
- Posted
- 11 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/computervis...