ChatGPT did a better job than most residents

reca · Mar 26, 2023

Fluidity of Movement said:
There was a very recent study where chatGPT achieved at least 60% for each step of the USMLE.

I don't remember the full details of the study but it was impressive.

An AI that's been trained on the entirety of wikipedia and publicly accessible internet got 40% of USMLE wrong. That's impressive? A pre-med student allowed access to the internet for Step 1 could probably outscore that.

Fluidity of Movement · Mar 26, 2023

reca said:
An AI that's been trained on the entirety of wikipedia and publicly accessible internet got 40% of USMLE wrong. That's impressive? A pre-med student allowed access to the internet for Step 1 could probably outscore that.

It was all 3 steps, so it's more than just reciting the answer to "what nerve innervates X".

Being able to process clinical information, make recommendations, and interpret / respond to hypothetical situations. (And can then follow-up and carry on a conversation about the topic).

Very impressive, especially considering this is a free product and was not specifically trained on USMLE or given any special prompts.

This is beyond a proof of concept for what can happen given proper models and specialized training.

calvnandhobbs68 · Mar 26, 2023

Fluidity of Movement said:
It was all 3 steps, so it's more than just reciting the answer to "what nerve innervates X".

Being able to process clinical information, make recommendations, and interpret / respond to hypothetical situations. (And can then follow-up and carry on a conversation about the topic).

Very impressive, especially considering this is a free product and was not specifically trained on USMLE or given any special prompts.

This is beyond a proof of concept for what can happen given proper models and specialized training.

You've never taken Steps so while this may seem very impressive, once you do them you'll realize that the Steps are basically a combination of being a good test taker and brute force memorizing key words/concepts to link in questions (which is why there's a whole mini industry within med school test prep dedicated to reviewing "high yield" topics/subjects for Step 1 and 2). It's actually something you'd expect an AI to do very well at.

reca · Mar 26, 2023

Fluidity of Movement said:
It was all 3 steps, so it's more than just reciting the answer to "what nerve innervates X".

Being able to process clinical information, make recommendations, and interpret / respond to hypothetical situations. (And can then follow-up and carry on a conversation about the topic).

Very impressive, especially considering this is a free product and was not specifically trained on USMLE or given any special prompts.

This is beyond a proof of concept for what can happen given proper models and specialized training.

.....how long ago did you take steps? There hasn't been a question like "what nerve innervates x" for like 50 years.

I'm aware of what ChatGPT can do, I use it quite often these days. For what it does, sure, it's impressive. But its performance on the USMLE is a great example of where it fails. Getting a 60% on the USMLE is abysmal. Again, a pre-med student with internet access could outscore that. No idea why people are trying to spin it as some sort of impressive accomplishment when it actually shows a pretty big limitation with ChatGPT.

Fluidity of Movement · Mar 26, 2023

Hm, I actually opened up the study to learn a bit more:

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Author summary Artificial intelligence (AI) systems hold great promise to improve medical care and health outcomes. As such, it is crucial to ensure that the development of clinical AI is guided by the principles of trust and explainability. Measuring AI medical knowledge in comparison to that...

journals.plos.org

It looks like they did several tests changing the question style. one run was open ended, another was multiple choice, another was where it had to explain why the other answers were wrong.
Looks like at least one of the runs was 75% for step 1, 61% for 2, 68% for 3.

They mention it had at least one "significant insight" but I suppose that was subjective to the graders.

This part was certainly fun though!:

Way to 1up pubmed guys. Ha!

clozareal · Mar 30, 2023

Looks like ChatGPT 4 went from mere passing on the USMLE (on ChatGPT 3) to now getting 86% average. It's getting smarter and smarter.

GPT-4 on Medical Challenge Problems - Microsoft Research

We present a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. Our results show that GPT-4, without any specialized prompt crafting, exceeds the passing score on USMLE by over 20 points and outperforms earlier general-purpose...

www.microsoft.com

docksan · Mar 31, 2023

This just in, calculators can math faster than people.

clozareal · Mar 31, 2023

docksan said:
This just in, calculators can math faster than people.

It's not just calculating math. It's answering brand new questions that do not exist with accuracy and generating new content.

Merovinge · Mar 31, 2023

clozareal said:
It's not just calculating math. It's answering brand new questions that do not exist with accuracy and generating new content.

It's astonishing that it's basically doubled it's accuracy in the span of a few months (40% wrong to 20% wrong). I'm fairly convinced this is going to have large societal implications moving forward if we are seeing that rate of improvement.

docksan · Mar 31, 2023

clozareal said:
It's not just calculating math. It's answering brand new questions that do not exist with accuracy and generating new content.

It's computer generated. It's literally math.

tr · Mar 31, 2023

Merovinge said:
It's astonishing that it's basically doubled it's accuracy in the span of a few months (40% wrong to 20% wrong). I'm fairly convinced this is going to have large societal implications moving forward if we are seeing that rate of improvement.

Uh. Yup.

SmallBird · Apr 1, 2023

I find it most remarkable that there seems to be a bimodal distribution of reactions to the technology. I hypothesize that some are determined to not find it impressive for some reason.

I was talking to my colleague who is a neurologist who works in, well AI as applied to neuroimaging. He is a fairly savvy consumer of these technologies and says that the lab uses ChatGPT to write letters of recommendation, award nomitations, a range of compliance reports, etc. This was not a thing a year ago.

docksan · Apr 1, 2023

It's impressive, and it can be pretty scary what the potential ramifications of this would be, but it's still a program written by a person and thus is still limited in what it can and cannot do.

clausewitz2 · Apr 2, 2023

docksan said:
It's impressive, and it can be pretty scary what the potential ramifications of this would be, but it's still a program written by a person and thus is still limited in what it can and cannot do.

In a very real sense it is not just 'written by a person.' It was programmed to learn and then fed a huge set of training data, with structured feedback from hundreds of people that it used to refine it's outputs.

How it is doing what it is doing now is totally a black box. Nobody in the world could look at it's code at this point and predict its responses in any meaningful way. This really is a whole different category of phenomena.

It is limited by being a program written by someone in the same way a baby is limited by being some meat that came out of someone.

clozareal · Apr 2, 2023

clausewitz2 said:
In a very real sense it is not just 'written by a person.' It was programmed to learn and then fed a huge set of training data, with structured feedback from hundreds of people that it used to refine it's outputs.

How it is doing what it is doing now is totally a black box. Nobody in the world could look at it's code at this point and predict its responses in any meaningful way. This really is a whole different category of phenomena.

It is limited by being a program written by someone in the same way a baby is limited by being some meat that came out of someone.

I like the analogy to a baby, but even earlier in development: an embryo. No clue what it's going to turn into. Most embryos look the same.

bGMx · Apr 2, 2023

It's not impressive because it's wrong half the time; I think it's a glorified chat bot that has access to all human internet chatter. It's impressive that it can crunch that information and be right half the time. Is GPT conscious? I don't know if we know what consciousness is, so maybe? I think mostly chat GPT is slight of hand with respect to intelligence; I think it's overall an extremely powerful new search engine. I wouldn't trust it for making important decisions since it's wrong more times than I feel comfortable. This is just my take, though I am informed from people who are expert in the field of AI and AI research.

tr · Apr 2, 2023

SmallBird said:
I find it most remarkable that there seems to be a bimodal distribution of reactions to the technology. I hypothesize that some are determined to not find it impressive for some reason.

I was talking to my colleague who is a neurologist who works in, well AI as applied to neuroimaging. He is a fairly savvy consumer of these technologies and says that the lab uses ChatGPT to write letters of recommendation, award nomitations, a range of compliance reports, etc. This was not a thing a year ago.

I think the achievements of AI in the arts (literary and visual) are generally not impressive. They fed it all the dregs on the internet and it spat back the same. Presumably it would have done better if they'd fed it a diet of Keats, Thomas, and the Italian masters, but that's not what happened.

What I find truly terrifying is that it can apparently write very effective computer code. Once AI becomes self-replicating, things are well and truly out of our hands.

There apparently has already been at least one case of AI-assisted suicide.

'He Would Still Be Here': Man Dies by Suicide After Talking with AI Chatbot, Widow Says

The incident raises concerns about guardrails around quickly-proliferating conversational AI models.

www.vice.com

docksan · Apr 2, 2023

tr said:
I think the achievements of AI in the arts (literary and visual) are generally not impressive. They fed it all the dregs on the internet and it spat back the same. Presumably it would have done better if they'd fed it a diet of Keats, Thomas, and the Italian masters, but that's not what happened.

What I find truly terrifying is that it can apparently write very effective computer code. Once AI becomes self-replicating, things are well and truly out of our hands.

There apparently has already been at least one case of AI-assisted suicide.

'He Would Still Be Here': Man Dies by Suicide After Talking with AI Chatbot, Widow Says

The incident raises concerns about guardrails around quickly-proliferating conversational AI models.

www.vice.com

Really, this is the danger of AI. Its capacity for dialect or diatribe is only as good as its programmer but its capacity to write potentially dangerous code without fail safe measures is only limited by fail safes if the algorithm is self assembling.

clozareal · Apr 2, 2023

There's been an AI chatbot used for therapy since 2017 called Woebot that does online chat-based CBT.

G Sheb · Apr 3, 2023

SmallBird said:
I find it most remarkable that there seems to be a bimodal distribution of reactions to the technology. I hypothesize that some are determined to not find it impressive for some reason.

I was talking to my colleague who is a neurologist who works in, well AI as applied to neuroimaging. He is a fairly savvy consumer of these technologies and says that the lab uses ChatGPT to write letters of recommendation, award nomitations, a range of compliance reports, etc. This was not a thing a year ago.

Yeah exactly.
The tedious stuff no one wants to do. But that's not really impressive 'intelligence'.

ChatGPT was getting even the most basic questions wrong. "Who won the gold in 200m freestyle in the 1996 Atlanta Olympics" style questions. It searched the internet, found some answers but could not 'reason' what is to be trusted and what isn't.

clausewitz2 · Apr 3, 2023

G Sheb said:
Yeah exactly.
The tedious stuff no one wants to do. But that's not really impressive 'intelligence'.

ChatGPT was getting even the most basic questions wrong. "Who won the gold in 200m freestyle in the 1996 Atlanta Olympics" style questions. It searched the internet, found some answers but could not 'reason' what is to be trusted and what isn't.

To be fair, this doesn't actually separate it from a significant proportion of actual real-deal humans using the Internet.

At base this is Searle's Chinese room problem on steroids. For those not familiar, you have a guy in a room with an enormous book full of instructions. His job is to receive slips of paper through a slot with mysterious symbols on them, find the matching symbol in the book of instructions, and then write a different symbol derived from those instructions on another piece of paper, which he returns through the slot. For him it's just arbitrary symbol transformation.

Thing is, though, when you put a sentence of English on a piece of paper and put it through the slot, what you get back is the same sentence rendered perfectly in Mandarin. The guy in the room, to be clear, does not speak Chinese. Hell, let's say he doesn't even speak English. so he clearly doesn't understand how to translate English to Chinese.

But does the system of dude + book + slot know Chinese?

Searle thought "no", but it turns out to be pretty hard to justify an answer that does not involve acknowledging that in some sense the entire system understands Chinese. Objection often founder on the difficulty of specifying what, precisely, is lacking in terms of understanding that is not instantiated in the property of the system being able to turn appropriate, well-formed, germane inputs in one language into appropriate, well-formed germane inputs in another language. Strengthen the case, say the system can also take sentences in Chinese as input and output appropriate responses in Chinese.

Often people come up with objections that involve some reference to consciousness, but this is a shaky branch to put weight on. The philosopher Georges Reyes was fond of saying that he had no conscious experiences; he used to, but he was in a bike accident when he was 10 and hadn't had one since. His point was that there is literally no way anyone can tell him this is incorrect since, as David Lewis put it, "an incredulous stare is not a strong counter-argument."

AIs like this are increasingly going to turn this problem up to 11.

docksan · Apr 3, 2023

The problem with that supposition is that it equates this to intelligence instead of assembly. What these programs are doing currently are taking known things and, based off of the question, assembling them into a mixture of those known things.

People are impressed that it passed the USMLE or that a computer beat a chess master but honestly, I'm not. It's designed to do those things and when your processing power is devoted to that one thing you're going to be good at it.

It's the same reason a neurologist is not a rectal surgeon. I imagine if you gave the neurologist time, with their base knowledge, they would be able to put together an answer that makes sense but that is not the same thing at knowledge or even application of that knowledge so much as it is making an educated guess based on what's available.

So an AI program passed a medical licensing test. K, can it practice medicine? Can it really do things in real time with a patient, examine them, and predict accurate outcomes based on what a patient says to it? Does it have the intuition to know when a patient is lying or when something doesn't make sense?

No. It's input and output. That's all. There's nothing complex about that.

the5thelement · Apr 3, 2023

Its a different form of intelligence , inorganic but still "evolving" and slurping on the vast amount of data available. It has the potential to usher in a new utopian age but there are also grave dangers. This is a nice short book which explores both possibilities: Manna

ChatGPT did a better job than most residents

Full Member

Full Member

I KNOW NOTHING

Full Member

Full Member

The Child

Attachments

Full Member

The Child

Full Member

Full Member

inert protoplasm

Full Member

Full Member

Full Member

The Child

He moʻolelo ia e hoʻopau ai i ka moʻolelo holoʻoko

inert protoplasm

Full Member

The Child

Full Member

Full Member

Full Member

Full Member

Similar threads