logo_smallAxellero.io

XLSX Analysis

Analyze Excel spreadsheets and perform comprehensive data analysis on XLSX files including structure inspection, data validation, and content analysis.

XLSX Analysis

Perform comprehensive analysis of Excel spreadsheet files (XLSX) within the sandbox environment with capabilities for structure inspection, data validation, statistical analysis, and content extraction.

📊 Excel Analysis Capabilities

XLSX analysis provides deep inspection of Excel workbooks including sheet structure, data types, formulas, charts, and comprehensive data validation with statistical insights.

Overview

The XLSX Analysis tool enables thorough examination of Excel spreadsheet files, supporting workbook structure analysis, data type detection, formula inspection, data validation, and statistical analysis of spreadsheet contents.

Key Features

  • Workbook Structure Analysis - Inspect sheets, ranges, and organizational structure
  • Data Type Detection - Automatic detection and validation of data types
  • Formula Analysis - Extract and analyze Excel formulas and calculations
  • Statistical Analysis - Comprehensive statistical analysis of numeric data
  • Data Validation - Validate data integrity and identify anomalies

Methods

xlsxAnalysis

Analyze Excel spreadsheet files comprehensively.

ParameterTypeRequiredDescription
filePathStringYesPath to the XLSX file in sandbox environment
analysisTypeStringNoAnalysis type: 'structure', 'data', 'formulas', 'comprehensive' (default: 'comprehensive')
includeStatisticsBooleanNoInclude statistical analysis of numeric data (default: true)
validateDataBooleanNoPerform data validation and anomaly detection (default: true)
extractFormulasBooleanNoExtract and analyze Excel formulas (default: true)
sheetFilterArrayNoSpecific sheet names to analyze (default: all sheets)
dataRangeStringNoSpecific cell range to analyze (e.g., 'A1:Z100')
{
  "filePath": "/sandbox/data/financial_report.xlsx",
  "analysisType": "comprehensive",
  "includeStatistics": true,
  "validateData": true,
  "extractFormulas": true
}

Output:

  • success (Boolean) - Analysis operation success status
  • filePath (String) - Path to analyzed XLSX file
  • workbookInfo (Object) - Overall workbook information
    • totalSheets (Number) - Number of worksheets
    • fileSize (Number) - File size in bytes
    • createdDate (String) - Workbook creation date
    • modifiedDate (String) - Last modification date
  • sheetAnalysis (Array) - Analysis results for each worksheet
  • statisticalSummary (Object) - Statistical analysis results
  • formulaAnalysis (Object) - Formula extraction and analysis
  • dataValidation (Object) - Data validation results
  • analysisTime (Number) - Analysis duration in milliseconds

Workbook Structure Analysis

Sheet and Range Analysis

Data Analysis and Statistics

Statistical Analysis

Formula and Calculation Analysis

Formula Extraction and Analysis

Error Handling

Common Analysis Issues

Error TypeCauseResolution
File Format ErrorNot a valid XLSX fileVerify file format and integrity
Corrupted WorkbookFile corruption or partial downloadRe-download or repair file
Password ProtectedWorkbook requires passwordProvide password or use different file
Large File TimeoutFile too large for analysisUse specific sheet or range analysis
Formula ErrorsInvalid formulas in spreadsheetReview and fix formula syntax

Robust XLSX Analysis

def robust_xlsx_analysis(xlsx_path, fallback_strategies=None):
    """Perform XLSX analysis with comprehensive error handling."""
    
    if not fallback_strategies:
        fallback_strategies = [
            {"analysisType": "comprehensive", "includeStatistics": True, "extractFormulas": True},
            {"analysisType": "data", "includeStatistics": True, "extractFormulas": False},
            {"analysisType": "structure", "includeStatistics": False, "extractFormulas": False}
        ]
    
    for i, strategy in enumerate(fallback_strategies):
        try:
            print(f"🔄 Attempting analysis strategy {i+1}")
            
            analysis_params = {
                "filePath": xlsx_path,
                **strategy
            }
            
            result = xlsxAnalysis(analysis_params)
            
            if result['success']:
                print(f"✅ Analysis successful with strategy {i+1}")
                return {
                    "success": True,
                    "strategy_used": i+1,
                    "strategy_details": strategy,
                    "results": result
                }
            else:
                print(f"⚠️ Strategy {i+1} failed: {result.get('error')}")
        
        except Exception as e:
            print(f"💥 Strategy {i+1} exception: {str(e)}")
    
    return {
        "success": False,
        "error": "All analysis strategies failed",
        "strategies_attempted": len(fallback_strategies)
    }

# Usage with error handling
robust_result = robust_xlsx_analysis("/sandbox/problematic/corrupted_workbook.xlsx")
if robust_result['success']:
    print(f"Analysis completed using strategy {robust_result['strategy_used']}")
    # Process results...
else:
    print(f"Analysis failed: {robust_result['error']}")

Integration Patterns

With Other Data Tools

# Integrate XLSX analysis with other data processing
def xlsx_to_statistical_analysis_pipeline(xlsx_path, output_format="json"):
    """Complete pipeline from XLSX analysis to statistical reporting."""
    
    # Step 1: Analyze XLSX structure and data
    analysis_result = xlsxAnalysis({
        "filePath": xlsx_path,
        "analysisType": "comprehensive",
        "includeStatistics": True,
        "validateData": True
    })
    
    if not analysis_result['success']:
        return {"error": "XLSX analysis failed"}
    
    # Step 2: Extract key insights
    insights = {
        "data_summary": analysis_result.get('statisticalSummary', {}),
        "quality_assessment": analysis_result.get('dataValidation', {}),
        "recommendations": []
    }
    
    # Step 3: Generate recommendations
    statistical_summary = insights["data_summary"]
    if 'numericColumns' in statistical_summary:
        for col_name, stats in statistical_summary['numericColumns'].items():
            if stats.get('missingValues', 0) > stats.get('count', 1) * 0.1:
                insights["recommendations"].append(
                    f"Column '{col_name}' has high missing data rate - consider imputation"
                )
    
    # Step 4: Save results in requested format
    output_path = xlsx_path.replace('.xlsx', f'_analysis.{output_format}')
    
    if output_format == "json":
        writeFile({
            "filePath": output_path,
            "content": insights,
            "format": "json"
        })
    elif output_format == "csv":
        # Convert to CSV-friendly format
        csv_data = []
        if 'numericColumns' in statistical_summary:
            for col_name, stats in statistical_summary['numericColumns'].items():
                csv_data.append([
                    col_name, stats.get('count', 0), stats.get('mean', 0),
                    stats.get('stdDev', 0), stats.get('min', 0), stats.get('max', 0)
                ])
        
        writeFile({
            "filePath": output_path,
            "content": {
                "headers": ["Column", "Count", "Mean", "StdDev", "Min", "Max"],
                "rows": csv_data
            },
            "format": "csv"
        })
    
    print(f"📊 Analysis pipeline completed: {output_path}")
    return {
        "success": True,
        "insights": insights,
        "output_path": output_path
    }

# Usage
pipeline_result = xlsx_to_statistical_analysis_pipeline("/sandbox/data/survey_responses.xlsx", "json")

Next Steps: Use with Statistical Analysis for advanced analytics, or Data Transformation for data restructuring.