Ansible collection for agentless VM application discovery — modular runtime detectors with structured JSON output
Find a file
anaeem a7747158fa docs: add RUNBOOK.md — full end-to-end operations guide (post-simplification)
Covers:
- High-level architecture diagram (AAP / MTA / RHDH / git repos, data flows)
- What users see in each surface (MTA Application + Facts + Tags + Comments,
  RHDH Components with namespaces + annotations + tags + dependsOn)
- Entry points (JT 107 / 108 / 98 / 111) with inputs and wall times
- Common operations recipes (discovery, pair, thin Step 2, sync, etc.)
- Recovery procedures for the failure modes we hit during the sweep
  (empty Facts, stringification assert, RHDH not refreshing, etc.)
- Future enhancements: Argo CD, SBOM, Lightspeed, kai_refine re-enable,
  namespace backfill for legacy engagements, pom.xml deps as Backstage
  Resource edges in the dependency graph, thin Step 2 multi-deployment,
  deferred python_literal removal.
2026-05-16 23:08:50 +01:00
.ansible/collections/ansible_collections/migration/discovery Tier 2: Lint clean (0 failures), requirements.yml fix, var-naming doc 2026-05-11 00:29:29 +01:00
.devcontainer Swap.2: Replace LiteLLM endpoint + model + auth with OpenRouter 2026-05-11 15:35:27 +01:00
context Items 10+11: Spring Boot VM provisioned + devcontainer config 2026-05-11 01:48:32 +01:00
docs docs: add RUNBOOK.md — full end-to-end operations guide (post-simplification) 2026-05-16 23:08:50 +01:00
eda/rulebooks Item 14: EDA rulebook + integration doc 2026-05-11 01:50:19 +01:00
meta Fix lint failures: galaxy tags, runtime version, name casing 2026-05-09 22:47:56 +01:00
playbooks S#8 belt — apply ANSIBLE_JINJA2_NATIVE to the discovery_targets play too 2026-05-16 20:53:30 +01:00
rhdh Validation iteration: host JVMs/ports fallback + UX progress + synthetic-aware comments 2026-05-16 02:15:53 +01:00
roles Revert S#8 — python_literal filter is load-bearing, NOT a no-op 2026-05-16 21:07:55 +01:00
tests publish_to_mta: correct MTA coordinates schema (content map) + binary mvn:// 2026-05-16 02:37:08 +01:00
.ansible-lint Tier 2: Lint clean (0 failures), requirements.yml fix, var-naming doc 2026-05-11 00:29:29 +01:00
.env.example Swap.2: Replace LiteLLM endpoint + model + auth with OpenRouter 2026-05-11 15:35:27 +01:00
.gitignore scan-report plugin + emit_catalog perf fix + Resource enrichment 2026-05-16 00:15:01 +01:00
AGENTS.md Documentation refresh + 6 demo runbooks 2026-05-10 23:46:03 +01:00
ansible.cfg Revert S#8 — python_literal filter is load-bearing, NOT a no-op 2026-05-16 21:07:55 +01:00
CHANGELOG.md Update platform spec + design docs to reflect validated state 2026-05-13 04:32:59 +01:00
execution-environment.yml Item 13: Custom EE — BLOCKED on base image registry access 2026-05-11 01:28:27 +01:00
galaxy.yml Fix lint failures: galaxy tags, runtime version, name casing 2026-05-09 22:47:56 +01:00
inventory.ini.example Stress-test fixes: scoring, Jinja guards, secrets, lint, docs 2026-05-09 22:36:42 +01:00
README.md Update platform spec + design docs to reflect validated state 2026-05-13 04:32:59 +01:00
requirements.yml Tier 2: Lint clean (0 failures), requirements.yml fix, var-naming doc 2026-05-11 00:29:29 +01:00

migration.discovery -- Agentless Java Application Discovery

An Ansible collection that performs agentless discovery and deep inspection of Java applications on target VMs for migration planning. It identifies running Java application servers, extracts their configurations, introspects WAR/EAR deployments, scores migrate-ability, and produces enterprise-grade reports -- all without installing agents on target hosts.

Scope: Java applications only. Other runtimes (Node.js, Python, .NET, Go) are on the roadmap but not yet implemented.

Documentation Map

Document Description
CHANGELOG.md Version history and release notes
AGENTS.md Coding conventions for AI and human contributors
Architecture
docs/architecture/ 9 Mermaid diagrams: system, manifest gen flow, verdict logic, AAP integration, failure modes
docs/MTA_TO_CONTAINERFILE.md Headline pitch: MTA + discovery + LLM → Containerfile (validated hybrid architecture)
docs/SBOM.md CycloneDX SBOM generation via syft; tool selection rationale
docs/KAI_INTEGRATION.md Kai LLM integration; kai_refine role; OpenRouter backend
Validation
docs/E2E_PROOF.md Application-level evidence hierarchy with honest gaps
docs/FLEET_VALIDATION.md 10-VM / 12-app fleet pipeline results (100% LLM coverage via OpenRouter)
docs/INTERNAL_VERIFICATION.md E2E build/deploy/verify results
docs/AAP_E2E_VALIDATION.md AAP Job Template + Workflow validation
docs/MTA_ANALYZER_VALIDATION.md MTA analyzer recipe + PetClinic submission
Quality
docs/BLIND_RUN_AUDIT.md Hardcoded assumption audit with severity ratings
docs/ANSIBLE_REVIEW.md Ansible review findings + mitigations
docs/SECURITY.md Validator account lifecycle, secret hygiene, LLM redaction rules
docs/OPENROUTER_MIGRATION.md LiteLLM → OpenRouter migration record
Test artifacts
docs/TEST_FLEET.md Test fleet VM patterns and provisioning status
docs/fixtures/ Reference JSON fixtures (fleet report, MTA partial)
Runbooks
docs/runbooks/ Operational runbooks for common workflows

Step 1: Migration Readiness Report

Start here. Before any migration work begins, run the fleet report against your target VMs. This produces a customer-facing HTML report with Green/Yellow/Red verdicts for every host in the estate.

# Create an inventory of target VMs
cat > inventory.ini <<EOF
[all]
vm1 ansible_host=10.0.1.10
vm2 ansible_host=10.0.1.11
vm3 ansible_host=10.0.1.12

[all:vars]
ansible_user=deploy
ansible_become=true
EOF

# Run the Migration Readiness Assessment
ansible-playbook -i inventory.ini playbooks/report-fleet.yml \
  -e fleet_report_customer_name="Acme Corp" \
  -e fleet_report_engagement_name="Java Migration Assessment"

Output:

  • migration-readiness-report.html -- Self-contained HTML report (Red Hat branded, print-friendly)
  • fleet-report-data.json -- Structured JSON data reusable as input to the manifest-generation pipeline

The report includes:

  • Executive summary with Green/Yellow/Red donut and runtime breakdown
  • Per-host detail cards with JVM config, deployments, JDBC connections, keystores, secrets
  • Risk register aggregated across the fleet
  • Wave-based recommendations (Green first, then Yellow, then Red for architecture review)
  • Migration checklists per host

The same JSON data feeds directly into the generate_manifests role to produce Containerfiles, Kubernetes Deployments, Services, NetworkPolicies, and ExternalSecrets.

AAP Integration

Register as a Job Template in Ansible Automation Platform:

Field Value
Playbook playbooks/report-fleet.yml
Extra Variables fleet_report_customer_name, fleet_report_engagement_name
Inventory Target VMs
Credentials Machine credential with SSH + become

The HTML report is saved as a job artifact viewable in the AAP UI.


What This Collection Does

Point this collection at any Linux VM (via SSH) and it will:

  1. Inventory the OS, hardware, packages, services, processes, and listening ports
  2. Discover all installed Java environments (JDK/JRE locations, versions, vendors)
  3. Match findings against a library of Java application detector definitions
  4. Extract configuration files and runtime details for each detected application
  5. Perform deep inspection: JVM analysis, keystore enumeration, JDBC parsing, WAR introspection, build file analysis
  6. Score each host for migrate-ability (Green / Yellow / Red)
  7. Produce per-host JSON reports, a fleet HTML report, and an MTA handoff section for downstream migration tooling

Supported Java application detectors:

Application Category Key Detection Signals Status
Apache Tomcat Application Server catalina process, port 8080/8443 Validated
WildFly / JBoss EAP Application Server jboss-modules.jar, port 9990 Validated
Spring Boot Application Server JarLauncher process, spring .jar Experimental
Oracle WebLogic Application Server weblogic.Server, port 7001/7002 Experimental
IBM WebSphere Traditional Application Server WsServer, port 9080/9443 Experimental
IBM WebSphere Liberty Application Server ws-server.jar, port 9080/9443 Experimental

Experimental detectors have detection patterns defined but have not been validated against a live instance. Use at your own risk and verify results manually.

Architecture

Layer 1: GATHER       Layer 2: DETECT+EXTRACT    Layer 3: DEEP INSPECT     Layer 4: REPORT
+---------------+     +--------------------+     +---------------------+   +--------------+
| gather_facts  | --> | detect_apps        | --> | deep_inspect         | ->| report          |
|               |     | extract_config     |     |                     |   | fleet_report    |
| - OS/hardware |     | - Load detectors   |     | - JVM analysis       |   | gen_manifests   |
| - packages    |     | - Pattern matching |     | - Java version       |   |                 |
| - services    |     | - Confidence score |     | - Keystores          |   | - JSON per-host |
| - processes   |     | - Config reading   |     | - JDBC connections   |   | - HTML fleet    |
| - ports       |     | - Version detect   |     | - WAR introspection  |   | - G/Y/R scoring |
| - Java envs   |     | - Sensitive redact |     | - Build file analysis|   | - Containerfile |
| - filesystem  |     |                    |     | - Log config         |   | - K8s manifests |
+---------------+     +--------------------+     | - Network conns      |   +-----------------+
                                                  | - Env vars, cron     |
                                                  | - System packages    |
                                                  | - Secrets detection  |
                                                  +---------------------+

Role: gather_facts

Collects raw data from the target host using Ansible built-in modules (setup, package_facts, service_facts) and shell commands (ps auxww, ss -tlnp, find). Discovers all installed Java environments by scanning /usr/lib/jvm, /usr/java, /opt/java, and alternatives. All results are stored under the discovered_facts namespace.

Role: detect_apps

Loads YAML detector definitions from roles/detect_apps/files/detectors/ and matches them against gathered facts using five methods:

  • Process patterns -- regex against running process command lines
  • Package patterns -- regex against installed package names
  • Service patterns -- regex against registered services
  • Port patterns -- exact match against listening port numbers
  • Filesystem patterns -- glob-to-regex against discovered directories and marker files

Confidence is scored by how many methods matched: high (3+), medium (2), low (1).

Role: extract_config

For each detected application, this role:

  • Determines the application home directory from process arguments or filesystem paths
  • Reads configuration files (capped at 1MB each) with sensitive-field redaction
  • Captures JVM arguments, environment variables, and specific listening ports
  • Runs version detection commands

Role: deep_inspect

For each detected Java application, performs deep analysis:

  • JVM analysis: Parses full command line for -Xms, -Xmx, -XX:* flags, -D system properties, classpath, -javaagent entries, and GC algorithm
  • Java version: Runs java -version from the process's actual java binary (from /proc/PID/cmdline and /proc/PID/exe)
  • Keystores: Finds *.jks, *.p12, *.pfx, *.keystore files under app home and JVM arg paths. For each, runs keytool -list -v to extract alias, DN, and expiry (never key material)
  • JDBC connections: Parses config files for JDBC URLs -- extracts host, port, database, driver
  • System packages: rpm -qa or dpkg -l filtered by java, jdk, tomcat, jboss, etc.
  • Environment variables: Captures JAVA_HOME, CATALINA_HOME, JBOSS_HOME, WAS_HOME, MW_HOME, CLASSPATH from /proc/PID/environ, setenv.sh, and systemd units
  • Cron jobs: Parses crontabs for anything referencing the app
  • Log configuration: Finds log4j.properties, log4j2.xml, logback.xml, logging.properties -- extracts log file paths
  • Network connections: ss -tnp for the app's PID -- external hosts/ports it connects to
  • WAR/EAR introspection: Lists deployments (capped at 20), extracts web.xml (servlets, filters, listeners), Maven coordinates from pom.properties, MANIFEST.MF, WEB-INF/lib jar inventory, Spring Boot and JPA detection
  • Build file analysis: Parses pom.xml and build.gradle for dependencies, frameworks (Spring Boot, Java EE, Jakarta EE), database drivers, and messaging libraries
  • Secrets detection: Scans config files for known secret patterns (keystorePass, JDBC passwords, etc.)

Role: report

Renders a Jinja2 template into a JSON report at {{ output_dir }}/{{ inventory_hostname }}.json and prints a human-readable summary to stdout.

Role: fleet_report

Collects discovery data from all hosts in the play, merges deep inspection results into each detected application, and scores every host for migration readiness:

  • Green -- Ready to containerize (0-2 migration flags)
  • Yellow -- Requires targeted manual work (3-5 flags: secrets, JDBC, keystores)
  • Red -- Needs architecture review (6+ flags: complex runtimes, clustering)
  • Gray -- No Java applications detected

Produces:

  • Self-contained HTML report (migration-readiness-report.html) with executive summary, per-host detail cards, risk register, and wave-based recommendations
  • Structured JSON (fleet-report-data.json) reusable as input to generate_manifests

Role: generate_manifests

Generates enterprise-grade OpenShift manifests from discovery data:

  • Containerfile -- UBI9 base, non-root USER 185, OCI labels
  • Deployment -- Liveness/readiness probes, security context (restricted-v2), resource limits from JVM heap, OTEL annotations
  • Service -- ClusterIP for discovered ports
  • Route -- TLS edge or passthrough (based on keystore detection)
  • ConfigMap -- JVM opts, system properties
  • ExternalSecret -- Vault references for discovered secrets
  • NetworkPolicy -- Ingress/egress rules from discovered connections
  • Kustomization -- oc apply -k bundle

Role: kai_refine

LLM-assisted Containerfile refinement. Takes the deterministic Containerfile produced by generate_manifests and uses an LLM (via Kai) to suggest improvements: missing system packages, multi-stage build optimization, layer ordering, and security hardening. Toggle with use_kai_containerfile: true|false (default false). See docs/KAI_INTEGRATION.md for details.

Role: drift_detect

Detects configuration drift between the source VM and its containerized counterpart. Compares JVM args, environment variables, ports, JDBC connections, and system packages. Produces a drift report highlighting differences that may indicate migration regressions.

Run via the dedicated playbook:

ansible-playbook -i inventory.ini playbooks/detect-drift.yml \
  -e drift_detect_source_host=legacy-tomcat \
  -e drift_detect_container_ns=test-fleet-monolith

Usage

Prerequisites

  • Ansible 2.12+ on the control node
  • syft (optional, for SBOM generation): curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh
  • SSH access to target hosts
  • become (sudo) privileges on target hosts for reading config files and /proc

Install the collection

# From the repository
ansible-galaxy collection install git+https://git.arsalan.io/anaeem/ansible-collection-discovery.git

# Or from a local checkout
cd /path/to/ansible-collection-discovery
ansible-galaxy collection build
ansible-galaxy collection install migration-discovery-2.0.0.tar.gz

Run discovery

# Against a single host
ansible-playbook -i "target-vm," playbooks/discover.yml

# Against an inventory
ansible-playbook -i inventory.ini playbooks/discover.yml

# With custom output directory
ansible-playbook -i inventory.ini playbooks/discover.yml -e output_dir=/opt/reports

# Limit to specific hosts
ansible-playbook -i inventory.ini playbooks/discover.yml --limit webservers

Build and deploy

After discovery and manifest generation, use the build-and-deploy playbook to build the container image and deploy to OpenShift:

ansible-playbook playbooks/build-and-deploy.yml \
  -e app_name=classic-monolith \
  -e source_host=172.16.2.44 \
  -e target_namespace=test-fleet-monolith

Detect drift

After deploying, run drift detection to compare source VM and container:

ansible-playbook -i inventory.ini playbooks/detect-drift.yml \
  -e drift_detect_source_host=legacy-tomcat \
  -e drift_detect_container_ns=test-fleet-monolith

AAP Integration (Full Path)

For Ansible Automation Platform users, the recommended setup is:

AAP Resource Details
Project SCM pointing to this collection's Git repository, branch main, SCM update on launch
Job Template 1 "Step 1 -- Migration Readiness Report" using playbooks/report-fleet.yml with survey for customer_name and engagement_name
Job Template 2 "Step 2 -- Discovery + Manifest Generation" using playbooks/discover.yml
Workflow Template Chain JT1 (fleet report) on success into JT2 (manifest generation) for a full pipeline run
Inventory Target VMs with Machine credential (SSH + become)

The fleet report HTML is exposed as a job artifact via set_stats. See docs/AAP_E2E_VALIDATION.md for validated results.

Inventory example

[java_servers]
tomcat01.example.com
jboss01.example.com
weblogic01.example.com

[all:vars]
ansible_user=deploy
ansible_become=true

How to Add a New Detector

Adding a new Java application detector requires only creating a single YAML file in roles/detect_apps/files/detectors/. No code changes are needed. The file is loaded automatically on the next playbook run.

Detector Schema Reference

Every field explained:

# ==============================================================
# REQUIRED FIELDS
# ==============================================================

name: "Human Readable Name"
# Display name shown in reports and logs.
# Example: "Apache Tomcat", "Oracle WebLogic"

id: "snake_case_id"
# Unique identifier used as dictionary keys in Ansible facts.
# Must be unique across all detectors. Use lowercase with underscores.
# Example: "tomcat", "jboss_wildfly", "websphere_liberty"

category: "application_server"
# Classification for grouping in reports.
# Values: application_server, web_server, database, runtime, middleware, monitoring

# ==============================================================
# DETECTION RULES (at least one section should have entries)
# ==============================================================

detect:
  processes:
  # List of regex patterns matched against `ps auxww` command lines.
  # Each entry has a single "pattern" key containing a regex string.
  # Backslash-escape dots in package names: "org\\.apache" not "org.apache"
    - pattern: "org\\.apache\\.catalina\\.startup\\.Bootstrap"
    - pattern: "-Dcatalina\\.home="

  packages:
  # List of regex patterns matched against installed RPM/DEB package names.
    - pattern: "^tomcat"
    - pattern: "^java.*openjdk"

  services:
  # List of regex patterns matched against systemd/sysvinit service names.
    - pattern: "tomcat"

  ports:
  # List of TCP port numbers (integers) matched against listening ports.
  # Exact match only -- no ranges.
    - 8080
    - 8443

  filesystem:
  # List of glob patterns matched against discovered directory paths.
  # Supports * wildcard. Matched against /opt, /usr/local, /var/lib,
  # /home, /srv, /app, /u01 (configurable via scan_dirs).
    - "/opt/tomcat*"
    - "*/apache-tomcat-*"
    - "*/bin/catalina.sh"

# ==============================================================
# VERSION DETECTION
# ==============================================================

version_command: "{app_home}/bin/version.sh 2>/dev/null | grep 'Server version'"
# Shell command to determine the application version.
# {app_home} is replaced with the detected application home directory at runtime.
# Should output a single line containing the version string.
# Use 2>/dev/null to suppress errors, and always end with a fallback.

# ==============================================================
# CONFIGURATION FILES
# ==============================================================

config_files:
# List of configuration files to read when this application is detected.
  - name: "server_xml"
    # Identifier used as dictionary key in the report. Use snake_case.
    path: "{app_home}/conf/server.xml"
    # File path. {app_home} is replaced at runtime.

  - name: "tomcat_users_xml"
    path: "{app_home}/conf/tomcat-users.xml"
    sensitive: true
    # When true, passwords/secrets in this file are redacted with [REDACTED]
    # before being stored in the report.

  - name: "conf_dir"
    path: "{app_home}/conf.d/"
    directory: true
    # When true, lists directory contents instead of reading file content.

# ==============================================================
# HOME DIRECTORY DETECTION
# ==============================================================

home_detection:
  process_arg: "-Dcatalina.home="
  # JVM system property or command-line argument that contains the app home path.
  # Extracted from the process command line via grep.

  process_arg_alt: "-Dcatalina.base="
  # Fallback argument to try if process_arg yields nothing.

  default_home: "/opt/tomcat"
  # Static fallback path if neither process argument is found.

  default_home_alt: "/usr/share/tomcat"
  # Second static fallback path.

  env_vars:
  # Environment variables that may contain the home path.
  # Checked in /proc/PID/environ during deep inspection.
    - CATALINA_HOME
    - CATALINA_BASE

# ==============================================================
# DEEP INSPECTION FLAGS (optional)
# ==============================================================

war_introspection: true
# When true, the deep_inspect role will look for WAR/EAR files in the
# deployment directory and introspect each one (web.xml, pom.properties,
# MANIFEST.MF, WEB-INF/lib, Spring Boot detection, JPA detection).
# Capped at 20 deployments.

deployment_types:
# List of deployment artifact types this app server supports.
# Used to guide WAR/EAR introspection.
  - war
  - ear
  - jar

secrets_in_config:
# List of "filename:attribute" pairs that indicate where secrets live
# in this application's configuration files. The deep_inspect role will
# check if these exist and flag them in the report.
  - "server.xml:keystorePass"
  - "context.xml:password"

build_files:
# List of build file names to look for in the app home directory.
# When found, the deep_inspect role parses them for dependencies,
# frameworks, database drivers, and messaging libraries.
  - pom.xml
  - build.gradle
  - build.gradle.kts

# ==============================================================
# VERSION DETECTION METADATA (optional, used by deep_inspect)
# ==============================================================

version_detection:
  file: "registry.xml"
  # File relative to app_home that contains version information.
  alt_file: "lib/weblogic.jar"
  # Alternate location for version info.
  method: "manifest"
  # How to extract version: "manifest" (JAR MANIFEST.MF), "xml_element"
  # (XML tag), "properties" (Java properties file).

Example: adding an Apache Kafka detector

Create roles/detect_apps/files/detectors/kafka.yml:

name: Apache Kafka
id: kafka
category: middleware
detect:
  processes:
    - pattern: "kafka\\.Kafka"
    - pattern: "kafka-server-start"
  packages:
    - pattern: "kafka"
  services:
    - pattern: "kafka"
  ports:
    - 9092
    - 9093
  filesystem:
    - "/opt/kafka*"
    - "*/kafka/config"
version_command: "{app_home}/bin/kafka-server-start.sh --version 2>/dev/null | head -1"
config_files:
  - name: server_properties
    path: "{app_home}/config/server.properties"
  - name: log4j_properties
    path: "{app_home}/config/log4j.properties"
home_detection:
  process_arg: "-Dkafka.logs.dir="
  default_home: "/opt/kafka"

That is all. The next time the playbook runs, Kafka will be included in detection.

How Java Apps Are Configured in the Wild

Java application servers share common patterns that this collection captures:

JVM Configuration: Heap sizes (-Xms/-Xmx), garbage collector selection (-XX:+UseG1GC), system properties (-Djava.io.tmpdir), and Java agents (-javaagent:) are passed on the command line. These are critical for capacity planning during migration.

Configuration Files: Each app server has its own configuration layout (server.xml for Tomcat, standalone.xml for JBoss, config.xml for WebLogic) but they all define similar things: datasources, thread pools, security realms, and clustering. Passwords and connection strings live in these files.

Deployments: Applications are packaged as WAR (web) or EAR (enterprise) archives and dropped into a deployment directory. Each WAR contains WEB-INF/web.xml (servlet mappings), WEB-INF/lib/ (dependencies), and optionally Maven metadata and Spring Boot configuration.

Keystores: TLS certificates are stored in Java KeyStore (.jks) or PKCS12 (.p12) files. The keystore path and password are referenced in server configuration. Many production systems still use the default password changeit.

External Dependencies: Applications connect to databases (via JDBC), message queues (JMS/Kafka/RabbitMQ), and caches. These connections are defined in server config files or application properties and represent dependencies that must be available after migration.

Environment Injection: Configuration is often injected via setenv.sh (Tomcat), standalone.conf (JBoss), setDomainEnv.sh (WebLogic), or systemd unit Environment directives. These scripts set JAVA_HOME, JAVA_OPTS, CATALINA_OPTS, and custom properties.

MTA Integration Boundary

Each application in the report includes an mta_handoff section that clearly delineates responsibilities:

This collection discovers (infrastructure level):

  • Runtime JVM configuration (heap, GC, system properties, agents)
  • Listening ports and external network connections
  • Keystores with certificate details
  • JDBC datasource URLs and drivers
  • System packages (RPM/DEB)
  • Environment variables and setenv scripts
  • Cron jobs referencing the application
  • Log configuration and file paths
  • WAR/EAR contents (web.xml, dependencies, Maven coordinates)
  • Secrets in configuration files

MTA handles (code level):

  • Source code analysis and API compatibility
  • Deprecated API detection (javax -> jakarta, etc.)
  • Framework migration rules
  • Dockerfile and container manifest generation
  • Dependency vulnerability scanning
  • Code-level refactoring recommendations

Artifacts to pass to MTA:

  • WAR/EAR files listed in mta_handoff.artifacts_for_mta
  • Source repositories (if pom.xml/build.gradle found at build_info.build_files)
  • Configuration files listed in mta_handoff.config_for_mta

Example Output

{
  "host": "tomcat01.example.com",
  "scan_timestamp": "2026-04-10T14:30:00Z",
  "scan_version": "2.0.0",
  "scope": "java",
  "os": {
    "distribution": "Red Hat Enterprise Linux",
    "distribution_version": "8.9",
    "os_family": "RedHat",
    "kernel": "4.18.0-513.el8.x86_64",
    "architecture": "x86_64"
  },
  "hardware": {
    "vcpus": 4,
    "memory_mb": 16384,
    "swap_mb": 2048
  },
  "java_environments": [
    {
      "java_home": "/usr/lib/jvm/java-21-openjdk-21.0.10.0.7-1.el8.x86_64",
      "version": "21.0.10",
      "vendor": "Red Hat"
    },
    {
      "java_home": "/usr/lib/jvm/java-11-openjdk-11.0.25.0.9-3.el8.x86_64",
      "version": "11.0.25",
      "vendor": "Red Hat"
    }
  ],
  "discovered_applications": [
    {
      "name": "Apache Tomcat",
      "id": "tomcat",
      "category": "application_server",
      "version": "Server version: Apache Tomcat/9.0.93",
      "confidence": "high",
      "detection_methods": ["process", "service", "port", "filesystem"],
      "home_path": "/opt/tomcat",
      "jvm": {
        "java_version": "21.0.10",
        "java_home": "/usr/lib/jvm/java-21-openjdk",
        "java_vendor": "Red Hat",
        "heap_min": "512m",
        "heap_max": "2048m",
        "gc_algorithm": "G1GC",
        "system_properties": {
          "catalina.home": "/opt/tomcat",
          "catalina.base": "/opt/tomcat",
          "java.io.tmpdir": "/opt/tomcat/temp",
          "java.util.logging.config.file": "/opt/tomcat/conf/logging.properties"
        },
        "jvm_agents": [],
        "xx_flags": ["-XX:+UseG1GC", "-XX:MaxGCPauseMillis=200"]
      },
      "ports": [8080, 8443],
      "config_files": {
        "server_xml": {"path": "/opt/tomcat/conf/server.xml", "exists": true, "size": 7542},
        "context_xml": {"path": "/opt/tomcat/conf/context.xml", "exists": true, "size": 1234}
      },
      "deployments": [
        {
          "name": "myapp.war",
          "type": "war",
          "size_bytes": 45678912,
          "maven_coordinates": {
            "groupId": "com.example",
            "artifactId": "myapp",
            "version": "2.1.0"
          },
          "dependencies": [
            "spring-core-5.3.30.jar",
            "spring-web-5.3.30.jar",
            "hibernate-core-5.6.15.Final.jar",
            "postgresql-42.6.0.jar",
            "logback-classic-1.2.12.jar",
            "slf4j-api-1.7.36.jar"
          ],
          "web_xml": {
            "servlets": ["dispatcherServlet"],
            "filters": ["encodingFilter", "springSecurityFilterChain"],
            "listeners": ["org.springframework.web.context.ContextLoaderListener"]
          },
          "spring_boot": true,
          "jpa_configured": true
        }
      ],
      "jdbc_connections": [
        {
          "source_file": "/opt/tomcat/conf/context.xml",
          "url": "jdbc:postgresql://db.internal:5432/appdb",
          "host": "db.internal",
          "port": "5432",
          "database": "appdb",
          "driver": "org.postgresql.Driver"
        }
      ],
      "keystores": [
        {
          "path": "/opt/tomcat/conf/keystore.jks",
          "type": "JKS",
          "password_is_default": true,
          "aliases": ["tomcat"]
        }
      ],
      "external_connections": [
        {"remote_host": "db.internal", "remote_port": 5432, "protocol": "tcp"},
        {"remote_host": "redis.internal", "remote_port": 6379, "protocol": "tcp"}
      ],
      "log_config": {
        "type": "logback",
        "config_files": ["/opt/tomcat/webapps/myapp/WEB-INF/classes/logback-spring.xml"],
        "log_paths": ["/var/log/myapp/application.log"]
      },
      "system_packages": ["java-21-openjdk-21.0.10.0.7-1.el8.x86_64", "tomcat-native-1.2.39-1.el8.x86_64"],
      "environment_vars": {
        "CATALINA_HOME": "/opt/tomcat",
        "CATALINA_BASE": "/opt/tomcat",
        "JAVA_HOME": "/usr/lib/jvm/java-21-openjdk",
        "CATALINA_OPTS": "-Xms512m -Xmx2048m -XX:+UseG1GC"
      },
      "cron_jobs": [],
      "secrets_found": ["server.xml:keystorePass:/opt/tomcat/conf/server.xml"],
      "migration_flags": {
        "keystore_outside_app_home": false,
        "external_path_references": ["/opt/tomcat/temp", "/opt/tomcat/conf/logging.properties"],
        "session_persistence_configured": false,
        "clustering_configured": false,
        "jndi_datasources": 1,
        "custom_class_loader": false
      },
      "mta_handoff": {
        "analysis_target": "containerization",
        "artifacts_for_mta": ["myapp.war"],
        "config_for_mta": ["/opt/tomcat/conf/server.xml", "/opt/tomcat/conf/context.xml"],
        "this_collection_discovers": ["Runtime JVM configuration...", "..."],
        "mta_handles": ["Source code analysis...", "..."],
        "note": "MTA handles: dependency analysis, API compatibility, code-level migration issues. This collection handles: infrastructure-level discovery, runtime config, secrets, system packages."
      }
    }
  ],
  "summary": {
    "total_apps": 1,
    "high_confidence": 1,
    "medium_confidence": 0,
    "low_confidence": 0,
    "categories": {"application_server": 1},
    "total_java_environments": 2
  }
}

Roadmap

Future runtime support (not yet implemented):

  • Node.js / TypeScript applications
  • Python web applications (Django, Flask, FastAPI)
  • .NET / .NET Core applications
  • Go applications
  • Ruby on Rails

Design Decisions

  • Java-only scope -- focused depth over breadth. Deep JVM inspection, WAR introspection, and build analysis provide migration-critical data that shallow multi-runtime scanning cannot.
  • ignore_errors: true on all data-gathering tasks -- VMs in the wild have missing commands, restricted permissions, and non-standard layouts. The collection should always produce a report, even if partial.
  • ANSI stripping on all shell output -- terminals on Fedora/RHEL can inject escape codes into piped output via grep aliases.
  • Sensitive file redaction -- passwords and keys in config files are replaced with [REDACTED] before capture.
  • 1MB cap on config file reads -- prevents memory issues with unexpectedly large files.
  • 20 deployment cap -- WAR introspection is capped to prevent runaway execution on servers with many deployments.
  • Default keystore password probing -- tries changeit and changeme only. Never attempts brute force. Flags default passwords as a security finding.
  • No Tower/AAP dependency -- produces standalone JSON files that can be consumed by any downstream tool.
  • MTA boundary -- explicitly separates infrastructure discovery from code analysis to avoid duplicating MTA capabilities.

License

Apache-2.0