XLSX Analysis
Analyze Excel spreadsheets and perform comprehensive data analysis on XLSX files including structure inspection, data validation, and content analysis.
XLSX Analysis
Perform comprehensive analysis of Excel spreadsheet files (XLSX) within the sandbox environment with capabilities for structure inspection, data validation, statistical analysis, and content extraction.
📊 Excel Analysis Capabilities
XLSX analysis provides deep inspection of Excel workbooks including sheet structure, data types, formulas, charts, and comprehensive data validation with statistical insights.
Overview
The XLSX Analysis tool enables thorough examination of Excel spreadsheet files, supporting workbook structure analysis, data type detection, formula inspection, data validation, and statistical analysis of spreadsheet contents.
Key Features
- Workbook Structure Analysis - Inspect sheets, ranges, and organizational structure
- Data Type Detection - Automatic detection and validation of data types
- Formula Analysis - Extract and analyze Excel formulas and calculations
- Statistical Analysis - Comprehensive statistical analysis of numeric data
- Data Validation - Validate data integrity and identify anomalies
Methods
xlsxAnalysis
Analyze Excel spreadsheet files comprehensively.
| Parameter | Type | Required | Description |
|---|---|---|---|
| filePath | String | Yes | Path to the XLSX file in sandbox environment |
| analysisType | String | No | Analysis type: 'structure', 'data', 'formulas', 'comprehensive' (default: 'comprehensive') |
| includeStatistics | Boolean | No | Include statistical analysis of numeric data (default: true) |
| validateData | Boolean | No | Perform data validation and anomaly detection (default: true) |
| extractFormulas | Boolean | No | Extract and analyze Excel formulas (default: true) |
| sheetFilter | Array | No | Specific sheet names to analyze (default: all sheets) |
| dataRange | String | No | Specific cell range to analyze (e.g., 'A1:Z100') |
{
"filePath": "/sandbox/data/financial_report.xlsx",
"analysisType": "comprehensive",
"includeStatistics": true,
"validateData": true,
"extractFormulas": true
}Output:
success(Boolean) - Analysis operation success statusfilePath(String) - Path to analyzed XLSX fileworkbookInfo(Object) - Overall workbook informationtotalSheets(Number) - Number of worksheetsfileSize(Number) - File size in bytescreatedDate(String) - Workbook creation datemodifiedDate(String) - Last modification date
sheetAnalysis(Array) - Analysis results for each worksheetstatisticalSummary(Object) - Statistical analysis resultsformulaAnalysis(Object) - Formula extraction and analysisdataValidation(Object) - Data validation resultsanalysisTime(Number) - Analysis duration in milliseconds
Workbook Structure Analysis
Sheet and Range Analysis
Data Analysis and Statistics
Statistical Analysis
Formula and Calculation Analysis
Formula Extraction and Analysis
Error Handling
Common Analysis Issues
| Error Type | Cause | Resolution |
|---|---|---|
| File Format Error | Not a valid XLSX file | Verify file format and integrity |
| Corrupted Workbook | File corruption or partial download | Re-download or repair file |
| Password Protected | Workbook requires password | Provide password or use different file |
| Large File Timeout | File too large for analysis | Use specific sheet or range analysis |
| Formula Errors | Invalid formulas in spreadsheet | Review and fix formula syntax |
Robust XLSX Analysis
def robust_xlsx_analysis(xlsx_path, fallback_strategies=None):
"""Perform XLSX analysis with comprehensive error handling."""
if not fallback_strategies:
fallback_strategies = [
{"analysisType": "comprehensive", "includeStatistics": True, "extractFormulas": True},
{"analysisType": "data", "includeStatistics": True, "extractFormulas": False},
{"analysisType": "structure", "includeStatistics": False, "extractFormulas": False}
]
for i, strategy in enumerate(fallback_strategies):
try:
print(f"🔄 Attempting analysis strategy {i+1}")
analysis_params = {
"filePath": xlsx_path,
**strategy
}
result = xlsxAnalysis(analysis_params)
if result['success']:
print(f"✅ Analysis successful with strategy {i+1}")
return {
"success": True,
"strategy_used": i+1,
"strategy_details": strategy,
"results": result
}
else:
print(f"⚠️ Strategy {i+1} failed: {result.get('error')}")
except Exception as e:
print(f"💥 Strategy {i+1} exception: {str(e)}")
return {
"success": False,
"error": "All analysis strategies failed",
"strategies_attempted": len(fallback_strategies)
}
# Usage with error handling
robust_result = robust_xlsx_analysis("/sandbox/problematic/corrupted_workbook.xlsx")
if robust_result['success']:
print(f"Analysis completed using strategy {robust_result['strategy_used']}")
# Process results...
else:
print(f"Analysis failed: {robust_result['error']}")Integration Patterns
With Other Data Tools
# Integrate XLSX analysis with other data processing
def xlsx_to_statistical_analysis_pipeline(xlsx_path, output_format="json"):
"""Complete pipeline from XLSX analysis to statistical reporting."""
# Step 1: Analyze XLSX structure and data
analysis_result = xlsxAnalysis({
"filePath": xlsx_path,
"analysisType": "comprehensive",
"includeStatistics": True,
"validateData": True
})
if not analysis_result['success']:
return {"error": "XLSX analysis failed"}
# Step 2: Extract key insights
insights = {
"data_summary": analysis_result.get('statisticalSummary', {}),
"quality_assessment": analysis_result.get('dataValidation', {}),
"recommendations": []
}
# Step 3: Generate recommendations
statistical_summary = insights["data_summary"]
if 'numericColumns' in statistical_summary:
for col_name, stats in statistical_summary['numericColumns'].items():
if stats.get('missingValues', 0) > stats.get('count', 1) * 0.1:
insights["recommendations"].append(
f"Column '{col_name}' has high missing data rate - consider imputation"
)
# Step 4: Save results in requested format
output_path = xlsx_path.replace('.xlsx', f'_analysis.{output_format}')
if output_format == "json":
writeFile({
"filePath": output_path,
"content": insights,
"format": "json"
})
elif output_format == "csv":
# Convert to CSV-friendly format
csv_data = []
if 'numericColumns' in statistical_summary:
for col_name, stats in statistical_summary['numericColumns'].items():
csv_data.append([
col_name, stats.get('count', 0), stats.get('mean', 0),
stats.get('stdDev', 0), stats.get('min', 0), stats.get('max', 0)
])
writeFile({
"filePath": output_path,
"content": {
"headers": ["Column", "Count", "Mean", "StdDev", "Min", "Max"],
"rows": csv_data
},
"format": "csv"
})
print(f"📊 Analysis pipeline completed: {output_path}")
return {
"success": True,
"insights": insights,
"output_path": output_path
}
# Usage
pipeline_result = xlsx_to_statistical_analysis_pipeline("/sandbox/data/survey_responses.xlsx", "json")Related Tools
Statistical Analysis
Perform advanced statistical analysis on extracted data
Data Transformation
Transform and restructure Excel data for analysis
Read File
Read Excel files for data extraction and processing
Next Steps: Use with Statistical Analysis for advanced analytics, or Data Transformation for data restructuring.