PinnedResolvedagent-behavioraccuracybug

Jarvis generated a wrong SQL fix even though the KB had the right procedure

3,987 views3 replies74 likes
PS
Priya S.

Senior Engineer · Posted 23 days ago

Had a customer with a specific class of data issue. Our KB has a documented 4-step fix, including a very specific WHERE clause. Jarvis read the file (I can see the tool call), but then generated SQL with a different WHERE clause that would have affected the wrong rows. I caught it in the approval step.

This is the scary kind of failure mode. What's going on here?

3 Replies

Accepted answer
NK
Nikhil K.Staff23 days ago

Founder, CEO

Thank you for posting this. This is exactly the failure mode we lose sleep over, so let me give you the full picture.

When Jarvis reads a procedure file and then paraphrases instead of using the exact syntax, it's usually one of two things:

  1. The procedure file describes the intent ('remove duplicate orders for the customer') but doesn't include the exact SQL as a code block. Jarvis then generates SQL from its understanding, which can drift
  2. The procedure file has the SQL but in prose form, not inside a fenced code block. The model sometimes treats it as an example rather than a literal command to copy

Both are KB structure issues more than agent behavior issues. Fix:

  • Put the exact command inside a fenced code block with a language tag (```sql)
  • Add an explicit instruction above: 'Use this SQL exactly as written. Do not modify the WHERE clause.'
  • For really sensitive procedures, mark the file with a front-matter flag mode: literal which makes Jarvis refuse to paraphrase it

In the meantime, the approval step catching this is the system working as designed. Destructive SQL is a tier 3 tool call, never auto-executes. You saw what you should have seen, and had the chance to block it.

112
PS
Priya S.22 days ago

Senior Engineer

Confirming the structure fix worked. Wrapped the SQL in a fenced block, added the 'use exactly' instruction, and Jarvis now copies it character-for-character. Tested on a non-prod customer.

28
RO
Rin O.22 days ago

Ops Engineer

We went through this too. The front-matter mode: literal flag is underdocumented, worth a pinned post of its own.

19