My latest LLM code nightmare

Customer needs to automate code statical analysis into the integration workflow using a SAST tool. The detailed task specification comes from ChatGPT, suggesting semgrep ran from a docker image <a href="https://hub.docker.com/r/semgrep/semgrep" target="_blank" rel="noopener" title="">semgrep/semgrep</a>. I thought it wasn’t a bad idea. ChatGPT suggested to integrate the tool as a pre-commit git hook, that is fair for local development, but not for automated continuous integration. Ok, a pretty useless suggestion, but something to start from. <h2 class="wp-block-heading">Ask to LLM Agent</h2> The tools adopted for CI/CD is Jenkins, so I start interacting with LLM to suggest a groovy pipeline to integrate semgrep/semgrep for code analysis. I forgot to state to LLM engine (Claude) that Jenkins was ran as container, managed by a docker swarm instance, running into a VM separated from production (since docker swarm does not have namespace, I fully agree with this arrangement). Code generated rely on separate script involving the execution of <blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"> docker run -v $jenkinsworkspace:/code semgrep/semgrep …. </blockquote> Thats easy and cool. But wait, Jenkins ran in a docker, the agent used has the docker command, but it connect to the host /var/docker.socket This means that there is only a docker daemon on the host. When that docker daemon receive a command from docker CLI, it take argument as it is: mounting a volume from a path means just mounting the host’s path, because it is the only path known by docker daemon. The script generated by the LLM try to mount a container’s path to be read by a docker daemon. I feel lazy, and I want to let Claude agent fix the code for me. <h2 class="wp-block-heading">system architecture and pipeline design</h2> In details, the system architecture and pipeline for backend code (with or without SAST) was designed to do this steps: <ol class="wp-block-list"> <li>build a docker image of the service</li> <li>run unit tests and integration tests on the newly created docker image (using docker-compose for side services)</li> <li>(deploy by) push the docker image into a private repository</li> <li>Update the service into production or dev environment</li> </ol> The first prompt I gave to the agent was to read the existing Jenkinsfile and integrate a SAST step. <h2 class="wp-block-heading">Please Claude, fix it</h2> I stated the problem, and suggested to use a volume. The agent suggested code: <ol class="wp-block-list"> <li>creates a volume</li> <li>extracting code from image into the new volume</li> <li>extract code from volume into the container filesystem</li> <li>remove the volume</li> <li>create a new volume for analysing the code</li> <li>copy code from container filesystem into a new volume</li> <li>analyse the code</li> <li>remove the new volume</li> </ol> Isn’t here some repeated staff? The agent say no. This story go on for half an hour… The point was that the code was arranged in 2 function (bash function), and the agent treat those as silos. I didn’t want to waste my time explaining things to the agent, so I refactor the code my myself. I started with the idea to do staff in lazy mode, I ended with fighting the rigidity of AI Agent way of solving staff by adding operations. <ol class="wp-block-list"> <li>define unique volume name (using jenkins job number)</li> <li>create the volume</li> <li>extract code from image into the volume</li> <li>analyse the code contained into the volume</li> <li>remove the volume</li> </ol> But I have to use my hand to arrange code this way. I also must say that code generated by agents are full of checks, some of those are clever and nice to have, some are paranoid driven. So the job is to remove extra-code. <h2 class="wp-block-heading">The integrated SAST step in the pipeline</h2> At the end the Jenkins pipeline for the job build new artifacts: SAST reports. <figure class="wp-block-image size-full"><img src="https://smartango.com/wp-content/uploads/2025/10/SAST-by-Jenkins.png" alt="" class="wp-image-716"/></figure> Newly generated artifacts can be inspected to fix the code and release a more secure code. The step defined as: <pre class="wp-block-preformatted"> stage('SAST Security Scan') { steps { script { echo "🔒 Starting SAST scan for JavaScript/Node.js application" // Extract and scan from built Docker image (code only exists in image) echo "🐳 Extracting and scanning code from Docker image: ${env.LOCTAG}" def sastExitCode = sh( script: "./scripts/sast-scan-image.sh '${env.LOCTAG}'", returnStatus: true ) // Read results summary - check both locations def summaryContent = "" if (fileExists('sast-summary.txt')) { summaryContent = readFile('sast-summary.txt').trim() } else if (fileExists('/tmp/sast-summary.txt')) { summaryContent = readFile('/tmp/sast-summary.txt').trim() } if (summaryContent) { env.SAST_RESULTS = summaryContent echo "SAST Results: ${env.SAST_RESULTS}" // Parse results for detailed logging def results = env.SAST_RESULTS.split(',') def highIssues = results[0].split(':')[1] as Integer def mediumIssues = results[1].split(':')[1] as Integer def lowIssues = results[2].split(':')[1] as Integer echo """ 🔍 SAST Scan Summary: 🔴 High Severity: ${highIssues} 🟡 Medium Severity: ${mediumIssues} 🟢 Low Severity: ${lowIssues} """ } else { echo "⚠️ No SAST summary found - assuming no issues" env.SAST_RESULTS = "HIGH:0,MEDIUM:0,LOW:0" } // Set build status based on SAST results if (sastExitCode == 2) { currentBuild.result = 'FAILURE' error("❌ SAST scan failed due to high severity security issues") } else if (sastExitCode == 1) { currentBuild.result = 'UNSTABLE' echo "⚠️ SAST scan marked build as unstable due to medium severity issues" } else { echo "✅ SAST scan passed successfully" } } } post { always { // Copy SAST results from /tmp if they exist there sh ''' # Copy results from /tmp to workspace for archiving cp /tmp/semgrep-*.json . 2>/dev/null || true cp /tmp/semgrep-*.txt . 2>/dev/null || true cp /tmp/sast-*.txt . 2>/dev/null || true ''' // Archive all SAST results archiveArtifacts artifacts: 'semgrep-*.json, semgrep-*.txt, sast-*.txt', fingerprint: true, allowEmptyArchive: true // Display scan results in build description script { if (env.SAST_RESULTS) { def results = env.SAST_RESULTS.split(',') def highIssues = results[0].split(':')[1] def mediumIssues = results[1].split(':')[1] def lowIssues = results[2].split(':')[1] currentBuild.description = """ SAST: H:${highIssues} M:${mediumIssues} L:${lowIssues} """.trim() } } } failure { echo '❌ SAST scan failed - check security findings before proceeding' } unstable { echo '⚠️ SAST scan found medium severity issues - review before deployment' } success { echo '✅ SAST scan completed successfully' } } }</pre> here <code>archiveArtifacts artifacts</code> is the way to index the artifacts listed on top of Jenkins interface <h2 class="wp-block-heading">Is agent mode good or bad idea on coding?</h2> I am still reluctant to adopt agent mode. Someway it suggest good idea, but someway it use those idea in awful way. And worst of all, it keeps saying “you are perfectly right”, while it finds another idiot way of producing unnecessary code. I think that behind LLM usage there is an unsaid interest conflict: <blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"> The more the agent interact and create crufty code, the more token are consumed, the more fee are charged. </blockquote> This is not about making things done, it is about give away your money thinking you find the cheaper developer in the market: the LLM Agent. But someway it helps to know staff. My orientation is to use a mix of agent mode, and internal prompt, fixing staff by hand when it is a matter of refactoring, or clearly crufty code. I prefer to think to LLM as an useful stochastic parrot. <a href="https://smartango.com/portfolio/" title="">My services</a>

Customer needs to automate code statical analysis into the integration workflow using a SAST tool.

The detailed task specification comes from ChatGPT, suggesting semgrep ran from a docker image semgrep/semgrep.

I thought it wasn’t a bad idea. ChatGPT suggested to integrate the tool as a pre-commit git hook, that is fair for local development, but not for automated continuous integration. Ok, a pretty useless suggestion, but something to start from.

Ask to LLM Agent

The tools adopted for CI/CD is Jenkins, so I start interacting with LLM to suggest a groovy pipeline to integrate semgrep/semgrep for code analysis.

I forgot to state to LLM engine (Claude) that Jenkins was ran as container, managed by a docker swarm instance, running into a VM separated from production (since docker swarm does not have namespace, I fully agree with this arrangement).

Code generated rely on separate script involving the execution of

docker run -v $jenkinsworkspace:/code semgrep/semgrep ….

Thats easy and cool. But wait, Jenkins ran in a docker, the agent used has the docker command, but it connect to the host /var/docker.socket

This means that there is only a docker daemon on the host. When that docker daemon receive a command from docker CLI, it take argument as it is: mounting a volume from a path means just mounting the host’s path, because it is the only path known by docker daemon.

The script generated by the LLM try to mount a container’s path to be read by a docker daemon. I feel lazy, and I want to let Claude agent fix the code for me.

system architecture and pipeline design

In details, the system architecture and pipeline for backend code (with or without SAST) was designed to do this steps:

build a docker image of the service
run unit tests and integration tests on the newly created docker image (using docker-compose for side services)
(deploy by) push the docker image into a private repository
Update the service into production or dev environment

The first prompt I gave to the agent was to read the existing Jenkinsfile and integrate a SAST step.

Please Claude, fix it

I stated the problem, and suggested to use a volume.

The agent suggested code:

creates a volume
extracting code from image into the new volume
extract code from volume into the container filesystem
remove the volume
create a new volume for analysing the code
copy code from container filesystem into a new volume
analyse the code
remove the new volume

Isn’t here some repeated staff? The agent say no.

This story go on for half an hour…

The point was that the code was arranged in 2 function (bash function), and the agent treat those as silos.

I didn’t want to waste my time explaining things to the agent, so I refactor the code my myself.

I started with the idea to do staff in lazy mode, I ended with fighting the rigidity of AI Agent way of solving staff by adding operations.

define unique volume name (using jenkins job number)
create the volume
extract code from image into the volume
analyse the code contained into the volume
remove the volume

But I have to use my hand to arrange code this way.

I also must say that code generated by agents are full of checks, some of those are clever and nice to have, some are paranoid driven. So the job is to remove extra-code.

The integrated SAST step in the pipeline

At the end the Jenkins pipeline for the job build new artifacts: SAST reports.

Newly generated artifacts can be inspected to fix the code and release a more secure code.

The step defined as:

        stage('SAST Security Scan') {
            steps {
                script {
                    echo "🔒 Starting SAST scan for JavaScript/Node.js application"
                    
                    // Extract and scan from built Docker image (code only exists in image)
                    echo "🐳 Extracting and scanning code from Docker image: ${env.LOCTAG}"
                    def sastExitCode = sh(
                        script: "./scripts/sast-scan-image.sh '${env.LOCTAG}'",
                        returnStatus: true
                    )
                    
                    // Read results summary - check both locations
                    def summaryContent = ""
                    if (fileExists('sast-summary.txt')) {
                        summaryContent = readFile('sast-summary.txt').trim()
                    } else if (fileExists('/tmp/sast-summary.txt')) {
                        summaryContent = readFile('/tmp/sast-summary.txt').trim()
                    }
                    
                    if (summaryContent) {
                        env.SAST_RESULTS = summaryContent
                        echo "SAST Results: ${env.SAST_RESULTS}"
                        
                        // Parse results for detailed logging
                        def results = env.SAST_RESULTS.split(',')
                        def highIssues = results[0].split(':')[1] as Integer
                        def mediumIssues = results[1].split(':')[1] as Integer
                        def lowIssues = results[2].split(':')[1] as Integer
                        
                        echo """
🔍 SAST Scan Summary:
   🔴 High Severity: ${highIssues}
   🟡 Medium Severity: ${mediumIssues}  
   🟢 Low Severity: ${lowIssues}
                        """
                    } else {
                        echo "⚠️ No SAST summary found - assuming no issues"
                        env.SAST_RESULTS = "HIGH:0,MEDIUM:0,LOW:0"
                    }
                    
                    // Set build status based on SAST results
                    if (sastExitCode == 2) {
                        currentBuild.result = 'FAILURE'
                        error("❌ SAST scan failed due to high severity security issues")
                    } else if (sastExitCode == 1) {
                        currentBuild.result = 'UNSTABLE'
                        echo "⚠️ SAST scan marked build as unstable due to medium severity issues"
                    } else {
                        echo "✅ SAST scan passed successfully"
                    }
                }
            }
            post {
                always {
                    // Copy SAST results from /tmp if they exist there
                    sh '''
                        # Copy results from /tmp to workspace for archiving
                        cp /tmp/semgrep-*.json . 2>/dev/null || true
                        cp /tmp/semgrep-*.txt . 2>/dev/null || true  
                        cp /tmp/sast-*.txt . 2>/dev/null || true
                    '''
                    
                    // Archive all SAST results
                    archiveArtifacts artifacts: 'semgrep-*.json, semgrep-*.txt, sast-*.txt', 
                                   fingerprint: true, 
                                   allowEmptyArchive: true
                    
                    // Display scan results in build description
                    script {
                        if (env.SAST_RESULTS) {
                            def results = env.SAST_RESULTS.split(',')
                            def highIssues = results[0].split(':')[1]
                            def mediumIssues = results[1].split(':')[1]
                            def lowIssues = results[2].split(':')[1]
                            
                            currentBuild.description = """
SAST: H:${highIssues} M:${mediumIssues} L:${lowIssues}
                            """.trim()
                        }
                    }
                }
                failure {
                    echo '❌ SAST scan failed - check security findings before proceeding'
                }
                unstable {
                    echo '⚠️ SAST scan found medium severity issues - review before deployment'
                }
                success {
                    echo '✅ SAST scan completed successfully'
                }
            }
        }

here archiveArtifacts artifacts is the way to index the artifacts listed on top of Jenkins interface

Is agent mode good or bad idea on coding?

I am still reluctant to adopt agent mode. Someway it suggest good idea, but someway it use those idea in awful way.

And worst of all, it keeps saying “you are perfectly right”, while it finds another idiot way of producing unnecessary code.

I think that behind LLM usage there is an unsaid interest conflict:

The more the agent interact and create crufty code, the more token are consumed, the more fee are charged.

This is not about making things done, it is about give away your money thinking you find the cheaper developer in the market: the LLM Agent.

But someway it helps to know staff. My orientation is to use a mix of agent mode, and internal prompt, fixing staff by hand when it is a matter of refactoring, or clearly crufty code. I prefer to think to LLM as an useful stochastic parrot.

My services

Ask to LLM Agent

system architecture and pipeline design

Please Claude, fix it

The integrated SAST step in the pipeline

Is agent mode good or bad idea on coding?

Related Posts

Quanto spendo per una connessione gprs/umts ?

Simulare eventi dell’interfaccia

Parsing XML: Sax or DOM