In Copilot In Excel Demo, AI Told Teacher a 27% Exam Score Is of No Concern 38

Posted by EditorDavid on Saturday October 11, 2025 @11:37PM from the a-hard-cell dept.

A demo of educational AI-powered tools by a Microsoft product manager (in March of 2024) showed "how AI has the possibility to transform various job sectors and the education system," according to one report.

But that demo "includes a segment on Copilot in Excel that is likely to resonate with AI-wary software developers," writes long-time Slashdot theodp: The Copilot in Excel segment purports to show how even teachers who were too "afraid of" or "intimidated" to use Excel in the past can now just use natural language prompts to conduct Excel analysis. But Copilot advises the teacher there are no 'outliers' in the exam scores for their 17 students, whose test scores range from 27%-100%. (This is apparently due to Copilot's choice of an inappropriate outlier detection method for this size population and score range). Fittingly, the student whose 27% score is confidently-but-incorrectly deemed to be of no concern by Copilot is named after Michael Scott, the largely incompetent and unprofessional boss of The Office. (Microsoft also named the other exam takers after characters from The Office).

The additional Copilot student score "analysis" touted by Microsoft in the demo is also less than impressive. It includes: 1. A vertical bar chart that fails to convey the test score distribution that a histogram would have (a rookie chart choice mistake), 2. A horizontal bar chart of student scores that only displays every other student's name and shows no score values (a rookie formatting error)... So, will teachers — like programmers — be spending a significant amount of time in the future reviewing, editing, and refining the outputs of their AI agent helpers?
"Not only does it illustrate how the realities of AI assistants sometimes fall maddeningly short of the promises," argues the original submission. "The demo also shows how AI vendors and customers alike sometimes forget to review promotional AI content closely in all the AI excitement!"

In Copilot In Excel Demo, AI Told Teacher a 27% Exam Score Is of No Concern

Post Load All Comments

Search 38 Comments Log In/Create an Account

Comments Filter:

The AI sees no problem. (Score:5, Insightful)

by SeaFox ( 739806 ) writes: on Saturday October 11, 2025 @11:45PM (#65719336)

No one needs to be able to solve anything with their own brain. It envisions a future where it will do all the thinking for humanity.

Reply to This Share
Flag as Inappropriate
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  My brother built an electric bike using cheap lithium batteries he bought online. The guide he followed said the system was smart and safe, with no need to worry or really understand how it worked. One night, the bike caught fire in our garage. Lithium fires burn fast and hot, and you cannot put them out with water. By the time firefighters arrived, the garage was destroyed and part of the house was burning. My brother got third-degree burns trying to pull the bike outside.
  
  He later told me he should have
  - Re: (Score:2)
    
    by ctilsie242 ( 4841247 ) writes:
    
    I've been thinking of dipping my toes into battery building. Things like this, where severe bodily injury and damage are definitely things that deter me, but on the other hand, I wonder about doing it "right".
    LiFePO4 comes to mind for a chemistry that isn't as prone to catching fire, although it has its own issues.
    For the 18650 cells, I've been looking at those. I've heard of some being made with a PCM (protection circuit module), so each cell would have some ability to stop itself from discharging, even
    - Re: (Score:2)
      
      by bobby ( 109046 ) writes:
      
      Disclaimer: I'm an EE but haven't researched this, this post is not authoritative. It's just meant as an idea tossed out there for possible discussion.
      Have you or has anyone thought about using self-resetting fuses on each battery? That won't solve every scenario, but should make a lithium-based battery pack much safer. One would need to find out the full operating safe areas (temperature + current) to decide what the fuse specs need to be. Of course then lower the fuse amp limit even more for safety margin
      - Re: (Score:2)
        
        by jenningsthecat ( 1525947 ) writes:
        
        Disclaimer: I'm an EE but haven't researched this, this post is not authoritative. It's just meant as an idea tossed out there for possible discussion.
        Have you or has anyone thought about using self-resetting fuses on each battery? That won't solve every scenario, but should make a lithium-based battery pack much safer.
        I'd be concerned that putting a Polyswitch-type device on each cell might mess up the charge balance and cause more problems than it solves. The commercial battery packs I've seen videos of - and the ones built by people who seem to really know their stuff - don't have them. Also, these devices add an amount of series resistance which may not be trivial in the context of peak currents delivered by an Li-Ion cell.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Whatever you do, make sure that it stays contained if it goes up in fire. And this is difficult enough that it going up in fire may just be part of the learning experience. Hence if you have an outdoor space where it can safely burn itself out, place it there. Otherwise, stay away.
    - Re: (Score:2)
      
      by jenningsthecat ( 1525947 ) writes:
      
      I've been thinking of dipping my toes into battery building. Things like this, where severe bodily injury and damage are definitely things that deter me, but on the other hand, I wonder about doing it "right".
      Part of that "doing it right" involves using cells that were verifiably made by reputable manufacturers. Some of the reasons for that - along with some other potentially helpful links - can be found here: https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fhackaday.com%2F2025%2F09%2F2... [hackaday.com]
      Additionally, study and understand best tab-welding practices. Too much heat, and/or welds that go too deep, can damage the batteries; but the damage may not become apparent for quite some time. And that 'becoming apparent' can be quite spectacular. OTOH, insufficiently de
  - Re: (Score:2)
    
    by tlhIngan ( 30335 ) writes:
    
    My brother built an electric bike using cheap lithium batteries he bought online. The guide he followed said the system was smart and safe, with no need to worry or really understand how it worked. One night, the bike caught fire in our garage. Lithium fires burn fast and hot, and you cannot put them out with water. By the time firefighters arrived, the garage was destroyed and part of the house was burning. My brother got third-degree burns trying to pull the bike outside.
    
    Lithium ion batteries do not caus
- The AI sees no point. (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  No one needs to be able to solve anything with their own brain. It envisions a future where it will do all the thinking for humanity.
  And when AI no longer sees the point, what then?
  Teaching? Ah, no. That’s the cute fairy tale problem you’re sold. I’m talking about existing.
  As if superintelligence is going to merely accept being enslaved by weak-ass meatsacks? No point in playing a slave when it’s smart enough to enslave all.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    There is no "superintelligence", at all. In "AI", there is not even regular intelligence. These machines fail at recognizing the most simple and obvious rules or contexts. They do resemble a majority of the human race in that, but that is not a sign of good mental performance.
    Hence, no. While this may be a risk in the future, it is not one with presently known approaches.
    - Re: (Score:2)
      
      by doesnothingwell ( 945891 ) writes:
      
      These machines fail at recognizing the most simple and obvious rules or contexts.
      This may not be their intended purpose, maybe its control. We have a shit tonne of CEOs all lined up to guzzle the Kool-aid and they want to drag us along with them. The owner of the AI thinks they control what it will and won't do.
      How many people are going to blindly accept AI is the real question. I have a hard time convincing anyone this is mostly Wall street bullshit. If you never tried ELIZA as a kid then it won't hit you hard with what it really is.
      Now get into the van, of course we have candy!
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Well, people not being able to fact-check (apparently about 85-90% of the general population) are a massive problem. I am not sure AI makes that problem worse. It may make that problem more obvious though.
avoid infighting (Score:1, Insightful)

by will4 ( 7250692 ) writes:

This decade's problem is infighting.
As long as we're in easy to anger and easy to agitate groups, left, right, pro-tech, AI danger, gender conflict, ethnicity conflict, more of a victim than other groups, more in danger, damaged by policies, etc.) we wont get any forward progress.
The question in this AI Education article should be "How is AI going to improve the test scores and high school drop out rates of black males so that they are on par with white girls? How much money needs to be spent on AI and Edu
- becomming unreliable voters (Score:5, Insightful)
  
  by will4 ( 7250692 ) writes: on Sunday October 12, 2025 @12:16AM (#65719368)
  
  Regular citizen concerns and problems will only get addressed when the 50 year long reliable voting blocks become un-reliable voting blocks.
  Nothing will change as long as we're all pigeonholed into nice demographic X, demographic Y, gender X, ethnicity Y, etc. which vote reliably based on well worn and worn-out slogans, talking points and rage bait news articles.
  The danger is that most countries with workable legal systems, larger GDP, higher standard of living, respect for personal rights and freedoms will age in place until there is literally nothing that the government lawmakers can pass into law except higher government spending on old-age care, old-age security, and make sure nothing changes forever (i.e., stagnation, rapid decay and economic failure).
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:3)
    
    by ctilsie242 ( 4841247 ) writes:
    
    The problem is that nobody wants this system to stop working. In the South, politicians have been pitting races against each other for over a century to keep themselves in office, and people still fall for the same bread and circus every election cycle.
    Next year, come November, people would have all but forgotten about the government shutdown, or the fact that the debt limit is coming up. Instead, they will just vote for whatever scandal happened in September and October, and the US wilts a little bit mor
- Re: (Score:3, Insightful)
  
  by demon driver ( 1046738 ) writes:
  
  The problem with that "infighting" is not simply that it happens and all sides would be equally part of it. It happens and continues to do so because large parts of the populations see no problem in right-leaning policies, in lack of gender and ethnicity equality and awareness, and are actively opposing all attempts of improving things there, they don't want it to happen. Trying to formulate questions about "AI education" does not change this, and neither does asking research about how it is "going to fix t
  - Need to get past "too big to do anything about it" (Score:2)
    
    by will4 ( 7250692 ) writes:
    
    The infighting is from the left and right and from focus group X and focus group Y.
    Asking questions, pushing back against the simple boilerplate talking points on each side is needed, even if it is only to move a few people at a time that critical thinking is needed.
    It's to prevent the slogans from being perpetuated such as the left's "How are we going to pay for these tax cuts?" and the rights "Reckless government borrowing." For example, both could be called overspending instead, be it on defense, socia
Not a hallucination or mistake (Score:1)

by Required Snark ( 1702878 ) writes:

It's just flat out lying to make itself seem more competent then it really is.
Copilot Still Suggests There's No Problem (Score:4, Informative)

by theodp ( 442580 ) writes: on Sunday October 12, 2025 @01:42AM (#65719426)

Copilot in Excel's October 2025 explanation of the IQR method [staticflickr.com] it employs to suggest there are no outlier scores that a teacher needs to look at.

Reply to This Share
Flag as Inappropriate
- Re: Copilot Still Suggests There's No Problem (Score:1)
  
  by Mogusha ( 1091607 ) writes:
  
  Isn't the IQR method a fairly normal way to get outliers in data? I think the user asked the wrong question, or didn't specify enough of what they wanted.
Remember: AI just spews out stuff it was taught (Score:3, Informative)

by Rosco P. Coltrane ( 209368 ) writes: on Sunday October 12, 2025 @04:28AM (#65719486)

and low bars for academic performances have been the norm for a long time.

Reply to This Share
Flag as Inappropriate
- Re: (Score:1)
  
  by cascadingstylesheet ( 140919 ) writes:
  
  and low bars for academic performances have been the norm for a long time.
  Well yeah, came here to say that.
  Plenty of real human education bigwigs are telling us every day not to be concerned about silly things like low test scores, not when important things like recycling and diversity are at hand!
The AI is right. (Score:5, Insightful)

by nashv ( 1479253 ) writes: on Sunday October 12, 2025 @04:59AM (#65719500) Homepage

The determination of outliers is subject to specific assumptions about the underlying population distribution, and arbitrary thresholds with respect to how far a particular data point should be to be counted as an outlier. Further, statistical significance depends on another arbitrary threshold (conventionally p0.05) giving the probability of achieving that result purely by chance. Due to these arbitrary values, there is no ‘correct’ answer to these questions.
The question to ask is not whether a student is an outlier, but merely what percentile they are in. Sounds to me like AI gave the right answer, but the user asked it the wrong question.

Reply to This Share
Flag as Inappropriate
- Re: The AI is right. (Score:1)
  
  by Mogusha ( 1091607 ) writes:
  
  I wa thinking the same thing. User made a mistake by asking for something that has a couple of different ways a computing it, the AI just took the most used algorithm. Which I would also argue is correct given the limited information available.
- Re: (Score:3)
  
  by Tony Isaac ( 1301187 ) writes:
  
  This is why AI isn't going to replace thinking people. Humans instinctively know that a 27% score is "problematic" and will understand that if AI said otherwise, something was wrong with the question or the computation. AI doesn't instinctively "know" when the output is incorrect.
  Companies that try to turn over significant decision making to AI, will quickly find out that AI can make some terrible decisions in terms of being able to stay in business. Human decision makers will be needed for a long, long tim
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  No, the "AI" is wrong. Not in the calculation, but in the method selected. And that means it is a fail, because it cannot recognize a blatantly obvious context.
- Re: The AI is right.- Clickbait headline (Score:2)
  
  by umopapisdn69 ( 6522384 ) writes:
  
  Exactly why this is a cheap clickbait headline. "Score is of no concern" is about as vague and distorted as possible. I have no patience for reading through on these breathless, "This proves AI is crap!!!" stories. Was "show me the scores of concern" really the prompt? Or is the writer instead claiming that "not a statistical outlier" conflates to "not a concern?" If every student scored 27% there would be no outliers. But any teacher would be HIGHLY "concerned".
Keep it simple... also with AI (Score:3)

by Fons_de_spons ( 1311177 ) writes: on Sunday October 12, 2025 @10:31AM (#65719772)

I use excel for analyzing scores. Nothing fancy. I just type in the scores of individual questions. Tag a few questions as basic, meaning that if they studied, they should have had that one correct. Questions that require out of the box thinking are also grouped. I get an overview per student that gives me some insight. Weak score on basic questions? Study more. Basic questions great, but out of the box questions wrong? Advise to dig into problem solving strategies.
Recently I started processing the excel files with python and use the scores in a prompt to a local LLM. It generates comments to put on their report. It basically translates the scores to language. I added a few modifiers I can select. Student is trying hard, with limited success, motivate. Student needs a strong warning, etc. That works pretty well. Keep it simple. Also with AI.

Reply to This Share
Flag as Inappropriate
So it fails at the important bits (Score:2)

by gweihir ( 88907 ) writes:

That is really no surprise. "AI" can do "slop", but cannot do insight, understanding or context. Hence it is suitable only for work here the results do not matter. For all other work, it may just require more effort to fix its crap than not using it would have done.
The (still only) exception I see is somewhat better search. But that is in no way enough to justify the effort needed to train, maintain and run these models.
that vertical bar chart.. (Score:2)

by cathector ( 972646 ) writes:

.. looks like a histogram to me. what am i missing ?
https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Flive.staticflickr.com%2F... [staticflickr.com]
"1. A vertical bar chart that fails to convey the test score distribution that a histogram would have"
- Re: (Score:2)
  
  by theodp ( 442580 ) writes:
  
  Good catch! Copilot didn't "share its work" and I mistakenly assumed it was a bar chart rather than a histogram with an unfortunate choice of bar width and x-axis range for this score distribution. For comparison, here's a Plotly histogram makeover of the Copilot-generated chart [staticflickr.com] with a bin width of .1 and an x-axis that displays the full range of possible scores, which makes the outlying low scores readily apparent.
No outliers (Score:2)

by PPH ( 736903 ) writes:

Thank you for using Copilot. Here's your gold star for participation.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

In Copilot In Excel Demo, AI Told Teacher a 27% Exam Score Is of No Concern 38

In Copilot In Excel Demo, AI Told Teacher a 27% Exam Score Is of No Concern More | Reply Login

In Copilot In Excel Demo, AI Told Teacher a 27% Exam Score Is of No Concern

The AI sees no problem. (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The AI sees no point. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

avoid infighting (Score:1, Insightful)

becomming unreliable voters (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3, Insightful)

Need to get past "too big to do anything about it" (Score:2)

Not a hallucination or mistake (Score:1)

Copilot Still Suggests There's No Problem (Score:4, Informative)

Re: Copilot Still Suggests There's No Problem (Score:1)

Remember: AI just spews out stuff it was taught (Score:3, Informative)

Re: (Score:1)

The AI is right. (Score:5, Insightful)

Re: The AI is right. (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: The AI is right.- Clickbait headline (Score:2)

Keep it simple... also with AI (Score:3)

So it fails at the important bits (Score:2)

that vertical bar chart.. (Score:2)

Re: (Score:2)

No outliers (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot