COT (Chain of Thought): How sequential reasoning enhances AI decision-making

One of the factors that makes AI more powerful and useful is its ability to reason. Today’s AI lacks the ability to reason like a human, but it does provide a stimulated reason that is reasonable in most context. We have multiple tools in the box available such as fine-tuning, prompt design, prompt engineering, etc., to make the reasoning sharper as per the application of AI to solve problems.

Sometime ago, Amazon scrapped an AI based recruiting tool because it favored male applicants over female applicants; upon diagnosis it was found that the system was trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry, which resulted in the recruiting tool outright rejecting every female application. Thanks to prevailing human wisdom, the tool was scrapped, and Amazon claims no personnel decisions were made suggested by the tool.

On the other extreme, Google’s Gemini AI produced extremely inaccurate portrait of America’s founding fathers, this is an outcome of systemic training bias which was aimed to embrace diversity, but context was lost, and output is inaccurate.

If AI lauded as a force multiplier for knowledge workers, and if it is riddled with bias and errors, how can we take its responses seriously? How can we use AI in mission critical decision making?

One of the fix is to have a prompt engineering technique known as Chain of Thought (COT) an approach in AI and machine learning that enhances problem-solving by breaking down complex tasks into sequential, logical steps, mimicking human reasoning. This method improves the accuracy, transparency, and explainability of AI models by allowing them to process and analyze information in a more structured and detailed manner.

COT is how we learn solving problem as children, our math teachers gave us statements like ‘Raju has 13 apples, he ate 3, gave 2 to Peter, and bought 4 more from Abu’ how many apples does Raju have now?’ to improve our calculation techniques by breaking down problems into smaller segments, provide reason for the final output and justify any nuances in between.

In a paper published by Google Brain research team, it found chain-of-thought prompting outperformed standard prompting techniques on a range of arithmetic, common-sense and symbolic reasoning benchmarks.

Here’s an example on how enterprise can use COT. Let’s imagine Paisa Bank is using a traditional AI model to automate the loan approval process. The AI model receives inputs such as the applicant’s credit score, income, employment history, and requested loan amount. Based on these inputs, it provides a binary decision: approve or reject. Which leads to multiple challenges and questions including:

  • Explainability of the decision: an approval or rejection without a clear explanation, making it difficult for bank employees to understand the reasoning behind decisions or to meet regulatory compliance on the process
  • Were any nuances overlooked: in assessing the creditworthiness of the applicant. Was the loan approved a false positive (which leads to higher NPA for the bank)
  • How it answers transparency questions: in case of a false negative or even a true negative to end customers for whom a loan was rejected

If the bank were to implement a COT-based AI model for the loan approval process. This model breaks down the decision-making process into a series of logical steps, mirroring how a human loan officer would assess the application.

Initial Assessment: The model first evaluates the applicant’s credit score and categorizes it (e.g., excellent, good, fair, poor).
COT Step: “Applicant has a credit score of 750, which falls into the ‘excellent’ category.”

Income Verification: Next, it verifies the applicant’s income against the requested loan amount to assess affordability.
COT Step: “Applicant’s income is Rs. 800,000 annually, which is sufficient for the requested loan amount of Rs. 10,00,000.”

Employment Stability: It reviews employment history to gauge job stability.
COT Step: “Applicant has been employed with the current employer for 5 years, indicating stable employment.”

Debt-to-Income Ratio: The model calculates the debt-to-income ratio to ensure the applicant isn’t over-leveraged.
COT Step: “Debt-to-income ratio is 25%, which is within the acceptable range.”

Additional Factors: The model considers additional factors such as past loan repayment behavior and any recent financial hardships.
COT Step: “No history of missed payments on past loans. No recent bankruptcies or significant financial events.”

Final Decision: Based on the step-by-step analysis, the model provides a detailed recommendation.
COT Step: “Based on excellent credit score, sufficient income, stable employment, acceptable debt-to-income ratio, and good repayment history, the loan application is approved.”

In sum, if you are using a LLM or Gen AI for decision making process, a COT style can breakdown complex problems to provide much better accuracy in responses.

Here are some Use Cases for COT-based Generative AI Applications in Various Industries

IndustrySample Use CaseDescription
HealthcareMedical Diagnosis AssistanceAI assists doctors by breaking down a patient’s complex medical symptoms and history into smaller, manageable segments. This helps in forming a comprehensive diagnosis by analyzing each symptom individually and in combination with others. It provides recommendations for further tests, possible diagnoses, and treatment plans, enhancing the accuracy and speed of medical diagnoses.
BankingFraud Detection and PreventionMeticulously examine transaction data by breaking down each transaction into smaller steps, identifying unusual patterns and flagging potential fraud. By analyzing the sequence of transactions and comparing them with historical fraud data, the AI can more accurately detect and prevent fraudulent activities.
InsuranceClaims ProcessingFor insurance companies, COT-based Gen AI can streamline the claims processing by breaking down claim details into smaller segments, assessing the validity of each part, and ensuring compliance with the policy terms. This step-by-step evaluation helps in quick and accurate claims approval, reducing errors and fraud.
AviationFlight Operations OptimizationAI can help optimize flight operations by analyzing various factors such as weather conditions, fuel consumption, and air traffic data in a structured manner. By breaking down these elements, the AI can provide actionable insights to enhance flight safety, efficiency, and on-time performance.
FashionPersonalized Fashion RecommendationsLeverage the power of AI to personalize fashion recommendations by analyzing current trends, customer preferences, and purchase history. By breaking down these factors, the AI can suggest customized outfits and designs that cater to individual customer tastes, improving customer satisfaction and sales.
RetailInventory ManagementRevolutionize inventory management in retail by breaking down sales data, predicting future demand, and managing stock levels in real-time. This ensures that retailers maintain optimal inventory levels, reducing the risk of overstock and stockouts, and improving overall supply chain efficiency. 
ManufacturingPredictive MaintenanceCOT-based Gen AI can predict equipment failures by breaking down sensor data, maintenance logs, and operational patterns. This detailed analysis allows for proactive maintenance scheduling, reducing downtime and maintenance costs while ensuring smooth production operations. 

FAQs & Caveats

Is COT a holy grail for all LLM bias and issues?
Not exactly. While COT offers improvements over standard prompting, LLMs remain text-predicting neural network models that generate text sequences based on probability. Whether using COT or not, it doesn’t equate to the system reasoning in the same way humans do.

GIGO Still applies.
Garbage in, Garbage out is an evergreen maxim in data science. If a LLM had bias or errors factored in during the model training, COT cannot do much to change the results or reasoning capabilities of the model itself. COT can only help use an existing model effectively.

Is Chain of thought same as Prompt Chaining?
They may sound similar, but they are completely distinct techniques. Chain of Thought (COT) aims to break down a single complex question into smaller segments, whereas Prompt Chaining takes a more dynamic approach, involving multiple rounds of iteration where the output of the first query serves as the input for the second query. The later helps in developing a same idea in an iterative process like how humans brainstorm.

Reach us for a free consultation, or discussing a POC in AI, Cloud, or any form factor agnostic new product development.