From the Classroom to the Boardroom – Putting ChatGPT to the Test

An experiment that seeks to answer whether ChatGPT can effectively generate Boardroom strategy?

 Ethan Mollick, Associate Professor of Entrepreneurship & Innovation at Wharton wrote about the results of his AI experiment this semester. His focus has been to require his students use AI tools to spur creativity, generate ideas, and generate written and artistic material. All of the classes had a standard AI policy framing the minimum toolset (ChatGPT, Img Generation) and guardrails regarding the limitations. His insights are a must-read for not just higher education, but corporate and non-profit Board members as well.

Core to Professor Mollick’s learnings is that without training, almost everyone uses AI wrong:

By far the best approach, which led to both the best essays and the most impressed students, happened when people took the co-editing approach. The approach required a lot of careful focus on the AI output, which also made it very useful for student learning.

One of the more interesting insights was the importance of investing in prompt generation in a way that gives a persona to the system. For example, the way a student would present to a Professor may contain a different tone and intent than say, the President of a University presenting a proposal to the Board of Trustees. This prompted me to do my own experiment from the perspective of a Board Member.

Putting ChatGPT to the Boardroom test

As a thought experiment, I took a real-world scenario I’m involved with and tested ChatGPT based on my experiences as a Trustee and second-generation alum of The University of Tulsa (TU), over the past four years, I’ve been deeply involved in its transformation. This included bringing in BCG to augment regional insights, and working closely through shared governance in the ad-hoc Strategy Committee to build a focused strategic plan. Subsequently, I’ve served as Vice Chair of the Academic Committee, on the Presidential and Provost search committees and subsequently, for the past few years on the Executive Committee of the Board. All of these experiences have given me context ahead of my test.

One of the stated goals of TU’s President Carson is to accelerate the University from Carnegie R2 to R1 status. For the uninitiated, Carnegie Classifications are like a report card for how good a university is at doing research. R2 means you’re very good at doing research. R1 means you’re among the very best. So, I asked ChatGPT to generate three strategic plans to achieve this goal, using different prompts.

Board Room Challenge #1: The Generic Prompt

My first prompt was intentionally generic, without context of who the writer or audience are:

No alt text provided for this image
Challenge #1: The Generic Prompt

The results were provided in under a minute, and provided a convincing framework, though it lacked quantitative measures and focus. For example, the Carnegie Classification system provides basic criteria for Doctoral Institutions such as at generating at least 20 research/scholarship doctorates and $5M in total research expenditures per year. No mention is made by ChatGPT of these general criteria, nor of major sources of research dollars. The second issue where it completely missed was the published fact that the methodology is being updated in 2024, with a revised Basic classification and new Social and Economic Mobility classification. The third issue I had was the long-numbered list clocking in at over 400 words. At eight numbered points, it lacked clarity and focus. I’ll give credit back for recognizing the University is known in shorthand as TU nationally, and for using it correctly.

Board Room Challenge #2: The [More] Focused Prompt

For my second test, I asked for a more focused response, a five-point plan:

No alt text provided for this image
Challenge #2: The [More] Focused Prompt

The core of the plan is consistent, however additional detail such as the NSF and NIH being major federal sources of grants is included. Also, in response to the first challenge, ChatGPT made no mention of organizational culture, yet does in this version. It’s important enough to be its own bullet point (which I agree with).

At 343 words, the recommendations lack word economy. Some responses such as Graduate Education recommendations were longer than the original.

Board Room Challenge #3: The Length Constrained Prompt

For the next test, I wanted to force brevity. I asked ChatGPT to give me a proposal in 250 words or less:

No alt text provided for this image
Challenge #3: 250 words or less prompt (fail)

ChatGPT asserted the response meets the 250 words or less requirement. It was clearly working harder to meet that goal and took significantly longer to generate (I didn’t stopwatch, sorry). Unfortunately, it’s also wrong. The response was 351 words in its entirety. Even giving the benefit of the doubt – by measuring only the five points, ChatGPT generated 275 words. Detail was also lost such as NSF and NIH being major sources of grant funding in favor of wordy statements such as “best and brightest”. There’s some room for improvement here.

Bonus Challenge: ChatGPT as the University President?

For my final challenge, based on Professor Mollick’s suggestion that I ask ChatGPT to adopt a persona to generate a better response, I asked it to be the President of the University, and deliver a speech on the same topic to the Board of Trustees:

No alt text provided for this image
Bonus Challenge #4: Be the University President

Here we see in the opening, ChatGPT seems to have a general sense of persona and who the audience is. I sensed a little gravitas as well as new concepts such as explaining to the Board the benefit of greater prestige and the concept of responsibility as higher education institution. Perhaps ChatGPT understands the Board’s responsibility as fiduciaries? These concepts are all absent in the prior three tests. And while the response lacks specificity, it does provide a solid and convincing argument. The results were generated in less than two minutes, certainly faster than a human could write a roughly three-minute speech. That said, the following speech would generate numerous questions absent specifics. The University President’s role is safe.

Final Thoughts

As a follow up to my recent piece on how AI is making me better at BBQ, this little experiment demonstrates again the productivity benefits of generative AI, and its edges. Key to its adoption is adopting a core code of ethics and education regarding responsible AI use. Professor Mollick and his colleagues are doing an excellent job of providing resources for other institutions such as his AI Policy, guides for writing with ChatGPT, and for generating ideas as starting points. Each institution is going to have to develop policies specific to its domain.

I expect as we move past this period of mass experimentation, we’ll hear more conversations about the policies needed, and the training and education required to enable the responsible adoption of AI. For example, below was Bing’s response to the same query:

When asked to generate a university president's speech, Bing said plagiarism is wrong.
The new Bing won’t help me write a speech, for ethical reasons.

Over the past few years, I was involved in the creation and implementation of responsible AI frameworks for building software. In my next article I’ll share personal thoughts on a similar framework for the boardroom and classroom.


This article was written by a human, augmented with AI to check my grammar.

More Posts