Generate viral LinkedIn posts in your style for free.

Generate LinkedIn posts
Mayank A.

Mayank A.

These are the best posts from Mayank A..

4 viral posts with 3,527 likes, 255 comments, and 122 shares.
4 image posts, 0 carousel posts, 0 video posts, 0 text posts.

๐Ÿ‘‰ Go deeper on Mayank A.'s LinkedIn with the ContentIn Chrome extension ๐Ÿ‘ˆ

Best Posts by Mayank A. on LinkedIn

Nobody noticed when you improved it. Everyone noticed when it broke.

Life of a Software Developer ๐Ÿค—

--------

โž•ย If that hit you deep, following Mayank is non-negotiable.๐Ÿ˜ƒ
Post image by Mayank A.
We've all shipped an LLM feature that "felt right" in dev, only to watch it break in production.

Why? Because human "eyeballing" isn't a scalable evaluation strategy.

The real challenge in building robust AI isn't just getting an LLM to generate an output. Itโ€™s ensuring the output is ๐ซ๐ข๐ ๐ก๐ญ, ๐ฌ๐š๐Ÿ๐ž, ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ญ๐ž๐, ๐š๐ง๐ ๐ฎ๐ฌ๐ž๐Ÿ๐ฎ๐ฅ, consistently, across thousands of diverse user inputs.

This is where ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง ๐Œ๐ž๐ญ๐ซ๐ข๐œ๐ฌ become non-negotiable. Think of them as the sophisticated unit tests and integration tests for your LLM's brain.

You need to move beyond "does it work?" to "how well does it work, and why?"

This is precisely what Comet's ๐Ž๐ฉ๐ข๐ค is designed for. It provides the framework to rigorously grade your LLM's performance, turning subjective feelings into objective data.

Here's how we approach it, as shown in the cheat sheet below:
1./ Heuristic Metrics => the 'Linters' & 'Unit Tests'
- These are your non-negotiable, deterministic sanity checks.
- They are low-cost, fast, and catch objective failures.
- Your pipeline should fail here first.
โ–ซ๏ธIs it valid? โ†’ IsJson, RegexMatch
โ–ซ๏ธIs it faithful? โ†’ Contains, Equals
โ–ซ๏ธIs it close? โ†’ Levenshtein

2./ LLM-as-a-Judge => the 'Peer Review'
- This is for everything that "looks right" but might be subtly wrong.
- These metrics evaluate quality and nuance where statistical rules fail.
- They answer the hard, subjective questions.
โ–ซ๏ธIs it true? โ†’ Hallucination
โ–ซ๏ธIs it relevant? โ†’ AnswerRelevance
โ–ซ๏ธIs it helpful? โ†’ Usefulness

3./ G-Eval => the dynamic 'Judge-Builder'
- G-Eval is a task-agnostic LLM-as-a-Judge.
- You define custom evaluation criteria in plain English (e.g., "Is the tone professional but not robotic?").
- It then uses Chain-of-Thought reasoning internally to analyze the output and produce a human-aligned score for those criteria.
- This allows you to test specific business logic without writing new code.

4./ Custom Metrics
- For everything else.
- This is where you write your own Python code to create a metric.
- Itโ€™s for when you need to check an output against a live internal API, a proprietary database, or any other logic that only your system knows.

Take a look at the cheat sheet for a quick breakdown.

Which metric are you implementing first for your current LLM project?

โ™ป๏ธ Don't forget to repost.
Post image by Mayank A.
We are too focused on achieving full autonomy instead of designing reliable, human-in-the-loop systems.

An AI agent is not a loyal employee you can give a vague goal to.

It's a super-fast, super-naive intern with access to your entire production system. Without strict guardrails, it will inevitably get lost.

The breakthrough in agents won't be a single, smarter model. It will be the development of a standard "operating system" for agents that provides memory, tool constraints, and verification steps.

4 Principles for Building Reliable Agents:

1./ Autonomous loops โ‡ข Verifiable single steps.
Don't let the agent run free. Design it to propose a single next action, get automated or human approval, then execute.

2./ General tools โ‡ข Constrained capabilities.
Don't give an agent raw command-line access. Give it a small, well-defined, and observable set of APIs it is allowed to call.

3./ Vague goals โ‡ข Concrete success criteria.
The task shouldn't be "plan a trip." It must be "find three flights under $200 and two hotels with a rating above 4.5, then present them in a valid JSON object."

4./ Black box reasoning โ‡ข A transparent audit trail.
The agent must "show its work." It should output its reasoning, the tools it used, and the outcome of its actions at every single step.


-----------

You can follow Mayank for regular insights :)
Post image by Mayank A.
The only family tree I'm truly afraid of. Save this. You will need it. #datascience #statistics #machinelearning



This is your brain on Stats. ๐Ÿ˜‰




Image Source - math(dot)wm(dot)edu
Post image by Mayank A.

Related Influencers